Hacker News new | past | comments | ask | show | jobs | submit login
400 reasons to not use Microsoft Azure (azsh.it)
626 points by SlyHive 21 hours ago | hide | past | favorite | 334 comments





Story time!

A couple of years back, I was working at Mojang (makers of Minecraft).

We got purchased by Microsoft, which of course meant we had to at least try to migrate away from AWS to Azure. On the surface, it made sense: our AWS bill was pretty steep, iirc into the 6 figures monthly, we could have Azure for free*.

Fast forward about a year, and an uncountable amount of hours spent by both my team, and Azure solutions specialists, kindly lent to us by the Azure org itself, we all agreed the six figure bill to one of corporate daddy's largest competitors would have to stay!

I've written off Azure as a viable cloud provider since then. I've always thought I would have to revaluate that stance sooner or later. Wouldn't be the first time I was wrong!


When I worked at Jet, a shopping website trying to compete with Amazon, we obviously did not want to give money to Amazon, so we used Azure.

For the most part it was just fine, until we started using CosmosDB (then called DocumentDB).

DocumentDB, in its first incarnation, was utterly terrible. The pricing was extremely hard to predict, so we would end up with ridiculous bills at the end of the week, the provided .NET SDK for it was buggy and horrible, but the very worst part was the WebUI appeared to be directly tied to your particular instance of CosmosDB.

Why is this bad? Because if you under-provisioned stuff for your database, it might start going slow, and it would actually lag the web interface you would use to increase the resources! We got into situations where we had to turn off the entire application, just to bump up the resources for Cosmos. It felt like it was a complete amateur hour from Microsoft.

My understanding is that Cosmos has gotten a lot better, but man that left a sour taste in my mouth. If I end up getting some free credits or something, maybe I'll give Azure another go but I would definitely not recommend it right now.


A team in my org worked with Jet for 2+ years to help y’all scale.

It was interesting seeing the biweekly status updates, they basically all started with “This is how Jet.com broke Azure core services this week”.

As much as it sucks, this was a deliberate strategy all the way from Satya - every employee knew Azure was a joke, but the only want to actually fix shit was to get internet scale customers to break it daily and weekly.


> but the only want to actually fix shit was to get internet scale customers to break it daily and weekly.

I don't get it. There's lots of distributed systems theory that could provide a more robust, analytical approach to a scalable architecture. If a system is regularly breaking like this, it sounds like it should be a "back to the drawing board" moment.


Back to the drawing board risks delaying your products with several years. Their strategy was probably the right one.

It sounds like their product has been considered crap for years, so they've had the time.

> My understanding is that Cosmos has gotten a lot better

For some values of "better", I guess. Performance is still terrible, their data visualization/inspection tools are shameful, their SQL dialect is finicky and has no error reporting beyond "something is wrong with the input value", and their official Python SDK has a race condition bug that can silently clear out your documents when under heavy load.

I used to work at a Cosmos-heavy house and I would utter "fucking Cosmos" around 15 times a day.


> My understanding is that Cosmos has gotten a lot better, but man that left a sour taste in my mouth.

A couple of years ago I stumbled upon a Azure project which started off using the old timey Cosmos DB. Looking at the repository history from those days, I saw a bunch of Entity Framework configurations and navigations and arcane wizardry that would take an engineer months to really figure out.

Then there was an update to CosmosSDK, and all that EF noise was replaced by single CRUD operations that took the unserialized object, id and partition key as input. That's it.

Worlds of difference, and a simple key-value query takes ~10ms to do.

Yes, it's worlds of difference.


> Worlds of difference, and a simple key-value query takes ~10ms to do.

Unless that query goes over the Internet to another continent, that's a really long time isn't it?


> Unless that query goes over the Internet to another continent, that's a really long time isn't it?

If you're hosting a service in a cloud provider and you implement your services so that your call to the cloud provider's database goes over the internet to another continent, you have serious problems but none of them are caused by the cloud provider.

Also, CosmosDB is globally distributed.


They meant 10ms is slow.

> They meant 10ms is slow.

I don't think anyone making that sort of claim knows anything about cloud services. A single roundtrip of a no-op request within the same data center takes 0.5ms. Add querying across multiple partitions and data seeking, and you don't find cloud providers doing better.

To frame how oblivious that claim is, DynamoDB is lauded for it's response times being sub-20ms.

https://stackoverflow.com/questions/34552625/how-to-get-sub-...


No that's exactly what I mean: I expect the round-trip within a datacenter to take roughly 0.5ms, I expect the key lookup to take about that time or less, so 10ms is roughly an order of magnitude more than I would expect a "simple key-value query" to take

For comparison if you run your own hardware and do a memcached KV lookup with a different server on the same rack, p99 times are slightly under 1ms. Given the guarantees of cosmosdb ~10ms isn't that bad for a p100

Yes it’s ages but not multi continent long. Us east 1 to us east 2 is about 10-15ms in my experience.

You're right, I exaggerated unnecessarily there. Sorry

No reason to apologise - I assumed and tried to give a better reference.

It is an absolute eternity though. A KV lookup is fractions of a microsecond in a managed language like c#. A http request is in the microseconds range on localhost and a smidge more on a performant local network. A poorly behaved local network (busy wifi on an ISP router) is 1-2ms, and I can do a round trip to my nearest AWS region in 10-15ms from my home network.

It’s an absolute eternity, and when thinking about this stuff and scaling, think “is it worth slowing down the normal case by 1000x to introduce an external service”


Some numbers no one asked for :)

ConcurrentDictionary<K, V> read latency is going to be around 7-15ns for the data hot in the cache, scaling with key length if it is string-based, and anywhere between 75ns and 150ns for reading that out of RAM. Alternate implementations like NonBlocking.ConcurrentDictionary can reach down to 3.5-5ns for integer-based keys given all data is in L1 and the branch predictor is fully primed, on modern x86_64 cores.


> Because if you under-provisioned stuff for your database, it might start going slow, and it would actually lag the web interface you would use to increase the resources!

What the ever loving heck... seriously?! Why wouldn't this be a control plane API that reconfigures a data plane?!


I think it was querying the database to get usage information, and it was blocking the loading of the page as a result. It's been a few years, but if I recall correctly, the .NET API for changing the database provisioning did work, so some people would hack up a quick script to change resources in an emergency.

I think they did fix it eventually, because I know that multiple people on my team complained to Microsoft about it. Very short-sighted decision on their end, but to be fair it was a brand new product.


This is how hosted Elasticsearch works today, too. Really mess up your ES? Good luck with the web UI!

> What the ever loving heck... seriously?!

What's so surprising about that? If your CRUD operations take longer to do, and you're doing those to drive a GUI, of course the GUI will lumber along.


It's good practice to separate out your control plane and data plane, so in this kind of scenario you can use the control plane freely to manage and scale up the data plane without worrying about the data plane underresourcing affecting your control plane operations.

The reverse also applies; by separating them you can have issues with your control plane but not have the database go down.


It shouldn't lag the web interface. There are common ways to handle this, such as making all resource mutations asynchronous, or if you want to keep it synchronous, ensuring that control processes get guaranteed CPU resources. It is not a particularly challenging thing to do.

Control plane and data planes should be running on different hardware.

You would assume that the UI process would be running on Very High priority.

Your story reminds me of when Microsoft acquired Hotmail in the '90s and they tried migrating from FreeBSD & Solaris onto Windows NT/IIS. Having the world's largest email service running on the Windows stack would have been a huge endorsement. It took years until they were successful.

https://www.zdnet.com/article/ms-moving-hotmail-to-win2000-s...

https://jimbojones.livejournal.com/23143.html


Ha, I worked on that project. That drove a lot of good requirements into Windows that set us up for web based services (eventually)

Are you free to expand upon your role and perhaps some of the actual tech/fixes that made it back into Windows?

Seriously, Windows 2000 was one of the most stable OS back in the day, rock solid. I used 2000 server as a desktop OS, instead of 98.

unlike shit show that was windows 95/98/ME


While I don't disagree with that, in my experience all Windows instability on WinNT family (and I tightly worked with all end user versions of Win from 16 bit W3.11 to the recent Win11 with a very few exceptions) are caused by faulty hardware and/or bad drivers that can't handle it. I don't think I could remember any issue that I can't attribute to bad HW/3rd party driver.

Wrt Win95 & it's kind - all processes in that family essentially run in a single address space, and data "isolation" were "achieved" only through obscurity. If you knew some magic constants that were easily obtainable from disassembly, you could do anything there. So no wonder it was as bad as the worst program you've installed..


Almost all instability I’ve had with modern Windows or Macs has been caused by corporate installed malware - MDM software and virus software.

Haha, yeah, that crap adds indeed

Windows 2000 server was peak windows. All the subsequent versions just got harder to maintain as they gradually ruined the user interface. Nobody cares about the UI on consumer windows but if you’re spending a lot of time in RDP the vista based server products are terrible.

I don’t hate windows 2019 but Linux is better, easier, faster and a relief after any futile attempts to use IIS or sql server in 2025.


windows xp x64 edition was pretty slick; and so was NT4. I agree that 2000 was pretty cool, but perhaps a lot of that is design nostalgia. It was very "serious business OS" where XP and Me looked like jellybeans and cartoons. My favorite windows, though, is win 7 ultimate, Steve Ballmer Edition. i was sad when i had to upgrade to winten.

ninja proof: https://i.imgur.com/l29rDVo.jpeg


I get the nostalgia for XP, it was the first windows consumer edition that didn’t suck, but for a server OS 2000 was so lean and easy to manage it makes me wonder how MS lost to Linux. Back then, it was a genuine competition, now you’d have to be crazy to choose windows to deploy anything.

Windows Server still has it's place. AD DS, file services, and SQL Server being the big ones. Linux doesn't have apps that do these things 'better'.

I wish MSFT could build Active Directory and the associated constellation of services on Linux. You can make a reasonable simulacrum with Samba but it isn't as well-integrated.

(My fever dream wish is for a "distribution" of NT that boots in text mode and has an updated Interix subsystem alongside Win32. Throw in ZFS and it would be awesome.)


Maybe, but Win 3.1 was good for me.

i never used windows XP, i went from 2000 pro to XP x64 edition, which came out 2 years after XP did.

After SP2 the worst wrinkles are taken care of. Oh, and skip ever second OS release, of course.

I'm not as much against windows as I uses to be but I'm not budging off Ubuntu LTS even though they too try really hard to rock the boat.


> vista based server products are terrible.

The first generation of tabletised 8/Metro interfaces made me audibly groan every time I had to RDP into machines running 2012.


The stuttering over RDP when the start menu animation tried to slide in the tiles was amazing.

Oh yes. I still have a client that has 2012 and it physically hurts to use

Powershell was 2006, so I suppose the real "peak windows server UX" was 2016 when PS was relatively mature and came out-of-the-box with the latest version.

If MSFT had back ported servicing stack updates to 2016 it would still be usable. As it stands it bogs down unreasonably when applying updates and needs lengthy DISM /CleanupImage processes to be run periodically to reclaim disk space.

I went from 98 to 2000 (rather than ME) and it was an amazing experience. It showed me what an operating system could be like. Of course, what I really wanted was Linux, but I didn't know better at the time.

> Seriously, Windows 2000 was one of the most stable OS back in the day, rock solid. I used 2000 server as a desktop OS, instead of 98.

Really? Oh, compared to other Windows versions...

Because it never came close to the stability of OS/400, Netware 3, AIX, Solaris or even OS/2 v2.


I will fully agree on OS/400, of the operating systems and platforms I have worked with, it is by far the most stable.

That is easier to achieve when your operating system only runs on your own proprietary hardware. (No mess of millions of drivers to write for one).

It worked well for years without any sysadmin touching it.

Well my mom was trained to be the "sys admin", which meant rotating backup tapes.


I dunno how to compare stable to stable but I ran Win2k for so long that I got bored with it (something like 5-7 years) and never experienced a single crash. This is coming from a Linux guy btw… so I’m no Microsoft fanboy, just saying, it was as stable as any other stable OS.

Didn't mean to bash you, sorry.

I saw years of uptime on those systems whereas Win2000 iirc needed a reboot for every single update of the OS, and even for applications like IIS or Exchange.

Compared to NT4 it was probably very stable, since I remember telling most clients to just shut it down Friday evening and boot it Monday morning cause the pre-SP4 NT4 could not stay up more than three weeks.

Compare that to AS/400, where we pushed updates all over the country, without warning clients, to system running in hospitals, and there never was even the slightest problem. It sounds irresponsible to do that today, but those updates just worked, all the time and all applications continued to work.


> I saw years of uptime on those systems

This just means security updates were never installed.

(Or you claim that all those operating systems never had kernel-level security issues which seems doubtful...)


Since these systems were from the 90ies they indeed did not get security updates.

Most were only locally connected (for example OS/2 had a Token Ring in one building). The WAN connection (for AS/400) was trusted.


You are comparing supermarket apples (Windows) with localy grown plums (AS400). Even today, Windows is not able to update Office without closing it.

Like IIS running some part of the code in the kernel? ( http.sys ) :x

It has its advantages… but wasn’t done until Svr 2003.

https://learn.microsoft.com/en-us/iis/get-started/introducti...


> It has its advantages

Yeah, the advantages (RCE) were copied by modern web browsers. /s


So it is one of the most successful examples of dogfooding in history?

> Windows that set us up for web based services (eventually)

...and then .NET and SQL Server started shipping for Linux.


SQL Server is really Sybase tho, which was always capable of running on UNIX.

Can't say much more, but I worked on a huge (internal) Sybase ASE on Linux based app (you've _all_ bought products administered on this app ;) ) way back (yes, pre-SSD, multi path fiber I/O to get things fast, failover etc.) and T-SQL is really nice, as is/was ASE and the replication server. Been about 20 years tho, so who knows.


I worked with SQL Server a bit, writing a Rust client for it back in the days. The manual is really good, explaining the protocol clearly. That made it really easy to write a client for it.

Can't say the same for Oracle...


SQL Server uses NT and Win32 APIs, so the SQL team built a platform independent layer. Meaning NT and Win32 is still used by SQL on Linux. It’s pretty cool tech.

https://www.microsoft.com/en-us/sql-server/blog/2016/12/16/s...


I used to work at a Sybase shop in the late 90's. It was way nicer to work with than Oracle!

There are 2 decades between those 2 points, .NET was -4 y.o. at the first one.

Pre Microsoft hotmail is one of the things I miss about the 'old' internet, logging in with Navigator 3.something in the library at uni.

Links in the original are dead but I think this is the Microsoft doc on “what could Windows do better” - https://web.itu.edu.tr/~dalyanda/mssecrets/hotmail.html

Thanks for linking to this.

The tone and content of this document is shockingly candid and frank. I think it did a ton to make Windows Server a better product. I have a lot of respect for the people at MSFT who reviewed the company's own product in such a critical light.


Businesses are theoretically all about money but end up being driven by pride half the time.

Of course. Why would you expect anything but? Pride is actually a very good driver of change if you ask me because people often do their best work when they are proud of what they are building.

Perhaps ego is a better term. Concretely, why migrate hotmail from unix to Windows except due to ego? The NPV has to be negative here.

Makes sense to me. After all, businesses are run by humans, who have egos to satisfy.

Engineers might take pride in their work, but on this level in a big organization, I rather suspect turf wars as motivation.

> It took years until they were successful.

The 90s were the dark ages of cloud computing. It was the age of system administrator, desktop apps, Usenet, and the start of the internet as a public service. At the time concepts such as infrastructure as code, cloud, and continuous deployment, were unheard of.

AWS, which today we take for granted, was launched on 2002, and back then it started as a way to monetize Amazon's existing shared IT platform.

Of course migrating anything back then was a world of pain, specially when it's servers running on different OSes. It's like the rewrite from hell, that can even cover the OS layer. Of course it takes years.


> At the time concepts such as infrastructure as code, cloud, and continuous deployment, were unheard of.

There existed different names and solutions for things like cloud. I worked with Grid Engine in 2000 after Sun acquired Gridware, but that project started in 1993. By 2000 we were experimenting with running Star Office on the grid and serving UI to thin clients (kind of what Google Docs or Office 365 do now, but on completely different stack).


IIS was wide open in Win NT/2k days. It took Microsoft some good years to patch the holes.

You've got me curious: what was the single biggest barrier to migration, if you're able to disclose it? I'm guessing it was something proprietary to AWS, like how they handle serverless or something that couldn't translate over directly, but I'm always eager to learn why a migration from X to Y didn't work.

This is a couple of years ago, so I fully expect most of the issues we had back then to be fixed by now, but it was definitely Azure that was the problem.

We wanted to use their hosted kubernetes solution(I forget the name) and pods would just randomly lose connection to eachother, everything network related was just ridiculously unstable. Host machines would report their status as healthy, but then be unable to start any pods, making scaling out the cluster unreliable. I also remember a colleague I regarded as a bit of a wizard being very frustrated with cosmosdb, but I cannot for the life of me remember what the specific issue was.

Our solution was actually quite well written, if I do say so myself, we had designed it to be cloud agnostic, just on the off chance that something like this would happen (there may have been rumours this acquisition would happen ahead of time).

But Azure was just utterly unable to deliver on anything they promised, thus the write-off on my part.


Ohh, AKS. I had the 'pleasure' of using it quite early on, 5/6 years ago. We kept killing it by actually using it with more than 50 pods. You know you see one of few serious users when you get through the 3 layers of support and gets to talk to the actual developers in the US.

But my impression is that it's better now. In general my experience with azure is that the base services, those who see millions of hours in use, are stable. Think VMs, storage, queue etc. But the higher up you go in the stack, the fewer hours of use do they see, and the lower quality it gets.


I appreciate the context! Honestly, every time I start thinking that K8s might be the "universal cloud" orchestrator I've been hoping for/working towards, stories like this remind me just how...tenuous it can be relative to traditional VMs and standard containers-as-appliances. Still working to get my Admin cert for the spec, but it's definitely not something that sparks joy, if you catch my drift.

Thank you for the insights!


Pods loosing connection to each other is still very very common in azure.

To see the positive side of it, it’s a Chaos Monkey test for free. Everything you deploy must be hardened with reconnections, something you should be doing anyway.

What’s frustrating is that it happens rarely enough to give the illusion of being stable, making it easy for PM to postpone hardening work, yet often enough to put a noticeable dent in your uptime if you don’t address it. Perfect degree of gaslighting.


> To see the positive side of it, it’s a Chaos Monkey test for free. Everything you deploy must be hardened with reconnections, something you should be doing anyway.

Keep in mind that in Azure it's a must-have, whereas everywhere else it's either a nice-to-have or a sign your system is broken.


Having worked on the infra side of azure, I'm not surprised. Network is centrally managed and that team was a nightmare to deal with. Their ticket queue was so bad they only worked on sev 1 and the occasional 0. Nothing else got touched without talking to a VP and even then it often didn't change things.

At that point, aren't you better off actually making your own services regularly disconnect clients so you iron out all of the reconnection bugs?

Curious about details too. The parent's conclusion is to write off Azure, but I wonder if it's actually AWS or the way they use AWS that makes it hard to migrate.

Or put it in another way, if Mojong were to start with Azure but couldn't manage to migrate to AWS, which provider is the parent going to write off?


My experience has certainly been that AWS is both a) more stable, and b) has more migration resources and guides than the reverse.

It was easier to go to AWS than to Azure; and I've done both in the past ~4 years. Migrating to AWS was just technical work. Migrating to azure was 'fight unexpected bugs and the fact it doesn't actually work in surprising situations'.

The only reason to go to azure was that Microsoft waved a big fat $$$ discount to use their services.

Migrating to AWS was a breeze.


Yup, the hardest part about migrating to Azure was jumping between all the managed services that work everywhere else but are insanely buggy on Azure. We ended up with the most basic architecture you can imagine (other than AKS, which works great as long as you don't use any plugins) and we're still running into issues.

We have a very long list of Azure features and services that we've banned people from using.

Just got off a call with someone at Azure today who told us to setup our own NAT gateway instead of using Azure's because of an outage where we made too many requests and then got our NAT Gateway quota taken away for the next 2 hours.


To be fair AWS also has quotas on NAT Gateways.

Maximum 55k concurrent connections. After that, they make you deploy NAT gateways in other availability zones. And a max throughput of 10 Gbps.

I imagine AWS would also tell you to deploy your own gateway if you were running into the 55k concurrent connection limit of managed NAT.

AWS tends to be flexible with their quota enforcement in my experience, though.


Yea, I don't have a problem with the quota, more that the "out of quota" throttling lasts 2+ hours even after the traffic spike dies down.

> Yup, the hardest part about migrating to Azure was jumping between all the managed services that work everywhere else but are insanely buggy on Azure.

Care to point out a concrete example? I've worked with Azure a few years ago and I wouldn't describe it as buggy. At most, I accuse them of not "getting" cloud as well as AWS. For example, the whole concept of a function app is ass-backwards, vs just deploying a Lambda with a specific capacity. That is mainly a reflection of having more years working with AWS, though.


My experience with AWS is the documentation can be bad/wrong, it can be difficult to find stuff, but the actual services are very solid. It does what you tell it to do. Stuff works.

My experience with Azure is that it simply breaks in ridiculous and frankly unacceptable ways, all the time. It's like someone is unplugging network and power cables every couple hours just for fun.


> My experience has certainly been that AWS is both a) more stable, and b) has more migration resources and guides than the reverse.

I'm not sure how well founded the stability argument is. I still remember the infamous series of AWS outages that took place a couple of years ago.

The fact that AWS invests heavily in vendor lock-in is a problem created by AWS, not their competitors.


… I was scrolling back through the “400 reasons not to use azure” in the OP and off the top of my head I’ve seen a dozen of them personally.

You decide if that’s more or less stable than AWS.

I’d say the evidence is pretty empirical, but hey, all I can say is my experience was utterly unambiguous.

You can argue a lot of things, but hundreds of azure fails in a big giant list is probably one of the tougher ones to go “no, this is fine compared to AWS!” about, imo.


The networking is so, so bad in azure. We ran into all kinds of craziness with simple things that kept kicking us in the nuts like port exhaustion between different subscriptions. I quit my last job partly because they were wedded to azure.

Current Mojang employee here, we moved fully onto Azure as of a few years ago AFAIK.

Some more game-oriented technologies of course have helped in the years since though.

Edit: AWS -> Azure :)


Oh you guys finally managed!? Cool to hear! I guess Azure must've gotten better then, back when I was there, the conclusion was that Azure simply wasn't mature enough to host Minecraft yet.

Again, this is a while ago: I remember when I started we were just starting to replace the old yggdrasil servers with the new micronaut based system which I think is still in use today?

I still remember that application fondly as the best architectured piece of software I've ever worked on. I hope all is well!


> Current Mojang employee here

Can I have my Mojang account back?

I literally cannot log in to it after it was forcefully migrated to Microsoft. Microsoft doesn't recognize my computer as not-a-bot. Something to do with being Linux, I imagine.

Or, can I get a refund?


Oof, this won't help you, but I recall sending a very angry email to me new corporate overlord in the Xbox org for blocking me from using their web services if my user agent said Linux.

This was during the "Microsoft <3 Linux" campaign, and I think I cited that and then told em Minecraft would not be able to move forward with Xbox account migration until they stopped such idiocy.

Since I was the dev tasked with migrating Mojang accounts to Xbox accounts, I felt I had at least SOME credibility to my claim that it was blocking me.

But honestly, modifying my user agent was easy, it just pissed me off.

They did fix that the same day tho, so I guess the believed me!


Moved my Minecraft license to a new Microsoft account as required. Microsoft account was flagged and blocked when I checked a few days after. And that's how I got scammed out of my Minecraft license.

It wouldn't have helped if you hadn't, since they've now deleted all non-Microsoft accounts.

Reminder to whomever is reading: if you bought the game during alpha, you have the right to all future Minecraft games and a premium account forever. Microsoft barely tried to uphold this by giving a free Bedrock license to alpha buyers for a limited time several years ago. I suppose you'd have to sue them now if they break it, and the judge will wonder why you bothered to bring a $20 dispute to court.


Their bot detection systems are _extremely_ overzealous and don't even _tell you_ they're denying you nor offer ANY recourse to correct the problems.

I think I finally managed to make it work in a new/clean Chromium browser profile.


Every time I've made a Microsoft account it's let me create it, then two days later locked me out if I don't give up my phone number.

The first time it was a throwaway account only needed for one day so I abandoned it. The second time I had to give them my phone number. Very scummy.


I bought my kid Minecraft and spent two hours trying to get it set up to run on a Windows 10 machine.

In the end I gave up and he plays it on an old Ubuntu laptop instead.


This is a story from some parallel universe right here. You bought a microsoft game and wasn't able to run it on windows, but it works on ubuntu?!? I almost spilled coffee reading this.

The problem was you probably ended up falling for the UWP version instead of the Java version. The Java version remains the community accepted “proper” version to this day.

How did it take two hours to install on windows 10?

it probably didn't work right away and two hours was spent on troubleshooting

> we moved fully onto AWS as of a few years ago

Did you mean Azure?


Yes, I do. Whoops!

It has since migrated to Azure. I suspect there was a gap in the technology that was since closed, as AWS certainly had a head start in general.

Yep, sounds like my experience. Years ago, we migrated of Rackspace to Azure, but the database latency was diabolical. In the end, we got better performance by pointing the Azure web servers to the old database that was still in Rackspace than we did trying to use the database that was supposedly in the same data centre.

I kicked up a stink and we migrated everything to AWS in under a week.


IMHO, you're gonna struggle if you move anywhere else from AWS. We're migrating to GCP and there are gaps all over the place.

That highly depends on what services your're using.

We migrated from AWS to GCP in 2016/2017 (mostly VMs and related stuff, CloudFront, etc - no lambdas) and it was pretty painless and everything worked smoothly until the end of that company.


is it because there are features that AWS provides to you that are not available in GCP, or just the fact that setting up exact replicas of processes is hard for migrations like these?

Out of curiosity, what are the biggest gaps you've hit in GCP?

This is the reason I try to avoid proprietary bullshit services. Use EC2, Postgres, and S3, and you’ll be fine in any cloud or even on bare metal.

That sounds to my self-hoster ears as an expensive way to do self hosting. Isn't at least half the point of AWS to use their SaaS thingies that all integrate with each other, I think people now call it "cloud native software"?

Not that most of our customers whose AWS environment we audit do much of this, at least not beyond some basics like creating a vlan ("virtual private cloud"), three layers of proxies to "load balance" a traffic volume that my old laptop could handle without moving the load average above 0.2, and some super complex authentication/IAM rules for a handful of roles and service accounts iirc

(The other half of the point is infinite scale, so that you can get infinite customers signing up and using the system at once (which hopefully pay before your infinite AWS bill is due), but you can still do that with VPSes and managed databases/storage.)


The point of moving to AWS is often to benefit from their data centers and reliability promises. So VPC, EC2, IAM, maybe S3 have a clear point.

And one small note: apart from S3, virtually all AWS services are tied to a VPC, any kind of deployment starts with "ok, in what VPC do you want this resource?".


You can get 95% of the reliability for 10% of the price at any dedicated hoster. AWS just figured out the magic word "cloud" means they can charge you 10 times the price.

At Azure or GCP you pay a similar price but you don't even get the reliability so literally why would you use them? The only reason I see is that "cloud" means you can start instances at any time without a setup fee or contract duration. But with the amount of cost difference, you could have three times your baseline "cloud" load running all the time at a non-cloud hoster, and still save money!


It is an expensive way to do self hosting, yeah! I guess one reason is, sometimes it’s easier to just use one of big N clouds – e.g. if you’re in a regulated industry and your auditors will raise brows if they see a random VPS provider instead. (Or maybe not? If you’re doing that kind of audits I’d love to hear your thoughts!)

> Isn't at least half the point of AWS to use their SaaS thingies

It is. (That’s how they lock you in!) I think it’s okay to use some AWS stuff once in a while, but I’d be wary of building your whole app architecture around AWS.

I’m in the self-hosters camp myself :-) I’m building a Docker dashboard to make it easier to build, ship, and monitor applications on any server: https://lunni.dev/


The is sort of petty, but… your web page does the horrible scroll-and-watch-the-content-fade-in thing. This is annoying and makes me want to stop reading. It also makes me wonder if your product does the same thing, which would be a complete nonstarter. Seriously, if I’m debugging some container issue, I do not want some completely unnecessary animation to prevent me from seeing the dashboard.

Thanks for the feedback! No, the product’s UI is definitely on the pragmatic side: I think the only blocking animation we have right now is dialog windows sliding in (and we try to avoid these altogether!) Both the landing page and the app disable animations when prefers-reduced-motion is enabled.

I’ll rethink the landing page animations a bit later! (I was thinking about redoing it from scratch again, anyway :^)


I’ve just dropped you a note in the chat. We’re also on Swarm, and dealing with most of the stuff you address, and some more. Would love to contribute to Lunni, if you’re open to that :)

I’m personally thoroughly unimpressed by the bare metal S3 implementations I’ve tried.

And there’s also an issue with client libraries and their compatibility with various implementations. I recently discovered this issue:

https://github.com/boto/botocore/issues/3394

This is GCS implementing the S3 API incorrectly in a way that really ought not to break clients, but it’s still odd because the particular bug on GCS’s end seems like it took active effort to get wrong. But it’s also boto (the main library used in Python to access S3 and compatible services) doing something silly, tripping over GCS’s bug, and failing. And it’s AWS, who owns boto, freely admitting that they don’t intend to fix it, because boto isn’t actually intended to work with services that are merely compatible with S3.

As icing on the cake, to report this bug to Google, I apparently need to pay $29 plus a 3% surcharge on my GCS usage. Thanks.

Time to check out OpenDAL, I suppose.


> As icing on the cake, to report this bug to Google, I apparently need to pay $29 plus a 3% surcharge on my GCS usage. Thanks.

That's the price of a support contract, not a "bug report". And it's not "plus", it's "or": support costs $29/month or 3% of your monthly billing, whichever is greater. It comes with SLA agreements for fixing or working around your reported problems. Though obviously in this case they'll probably just tell you to use their own python library and not boto.


Oh, thanks Google, if my cloud spend is more than $967/mo, then I don’t get dinged by the $29 minimum. But this is the price of a bug report, because I can’t file a bug report without paying it.

And this situation is bad business. Google advertises that GCS has S3 interoperability support. And they have customers who use it in its interoperable mode. Presumably those customers could use GCS’s biggest competitor, too. Shouldn’t Google try to make the S3 interop work correctly?


GCS and AWS are commercial products for which you pay, not open source projects you can expect to support you for free. I don't know what to say here, if you have something "serious" to do on these platforms then a 3% overhead for support seems like an obvious choice.

Honestly it seems to me like you're excited to have found a bug and want to report it for glory; we've all been there. But No One Cares about that stuff in the world of commercial software. They fix bugs for real customers, not internet rock stars. If you aren't losing even $29 (one mid-tier meal!) from this bug, well... does it even rise to the level of "yell about it on HN?".


> [...] and S3, and you’ll be fine in any cloud

Except Azure - AFAIK it's pretty much the only cloud provider that doesn't support S3 API, see e.g. https://learn.microsoft.com/en-us/answers/questions/1183760/...


It is also the reason to use proprietary bullshit services. If there was not utility gap, then a reasonable evaluation would return that a migration to them is not worthwhile.

Now of course, that's a very sizable if...


> proprietary bullshit services

> S3

Hasn't this recently been an issue where Amazon arbitrarily changes the S3 contract and all software following it as a spec has to play catch-up?


I think this happened a few times, but given there are many S3 clients, they can’t keep doing thiat or there will be a backlash.

Although perhaps it’s about time we made a proper standard based on S3, yeah.


It has, but that's not exactly on S3 is it?

The S3 system is proprietary to Amazon, and it's your fault if you're not using Amazon but you're relying on Amazon to not change it anyway, because they have no obligation to you.

The concept of object storage is not proprietary. You should be able to change your code to use a different object storage provider.


> Fast forward about a year, and an uncountable amount of hours spent by both my team, and Azure solutions specialists, kindly lent to us by the Azure org itself, we all agreed the six figure bill to one of corporate daddy's largest competitors would have to stay!

Being tied to AWS and being unable to shake off a huge bill is not a trait of its competitors. It's a trait of AWS, and stresses the importance of not picking them as your cloud provider.

Also, I think it's unbelievable that a monthly 6-figure invoice charged to a company already with cloud engineers in their payroll is not justification enough to shift their workload elsewhere.


Low 6 figures is just a dev. If a team of 5 devs has to work on the problem for 5 years, then it will pay for itself in 25 years. Likely beyond any planning horizon of a company with yearly performance evaluation cycles.

> Low 6 figures is just a dev.

...in the US.

In Europe, even the likes of Amazon pays it's SDEs 70k/year. In Sweden, for example, Microsoft pays it's SDEs south of 800k SEK, which is about 70k dollars/year.

Low 6 figures is an entire team of Microsoft SDEs working full time for a year.


Interesting, because until now I have the same opinion in reverse.

Each case is a special case how everything gets configured, but between Azure, GCP, AWS and IBM clouds, the ones with smoother experience on my case, have been on Azure, based on Java and .NET technologies.

And we also have our share of support tickets across all of them.

Now Azure back in its early days, 2010 - 2016 was kind of ruff, maybe this is the timeframe you're referring to?


Do you think it would be any different/better if you had to migrate to GCP (for example)?

Do you think it was the migration itself or the services on Azure?

Having worked with all three, there's certainly things that suck about all of them but I've found aws "most reliable" but also seems to have a large amount of disparate services needed to do things that were simpler on Azure.

GCP was pretty meh, but depends on what services you used.

Azure is a good choice for .net and sql server (azure sql or whatever it is now) but in but sure a service built for aws is going to "just work" on Azure (or vice versa).


A great story of MS incompetence and Amazon’s vendor lock in.

after using AWS and Azure extensively AWS seems to be quite well engineered by some very smart people. The isolation between regions is extremely good and the Availability Zone model is quite effective in making very reliable systems if you are willing to pay the cost of inter-AZ data transfer. My company has an Active Directory controller in 3 different AZs

Azure is a mess designed by smart people with no time and little budget. Azure flat out lies about the AZs they have by claiming two halved of one data center is two AZs


>Microsoft acquired Hotmail in the '90s

it was around 2007 (I'm not that old!)


https://en.m.wikipedia.org/wiki/Hotmail

> Founded in 1996 by Sabeer Bhatia and Jack Smith as Hotmail, it was acquired by Microsoft in 1997 for an estimated $400 million

Wrong decade, they really did acquire Hotmail in the 90's.


That's okay, Microsoft will rename it to something else and completely change the admin UI and APIs next week. It will now be called Dynamics CoPilot OneAI 365 for Business OneCloud.

+fabric

But the documentation and every other reference to it will retain the old name.


But some URLs will still be on "live.com", others on "outlook.com", others on "sharepoint.com", others on "msbinbows.com", others on...

take my angry upvote

In my experience (worked for organizations that used everything from on-prem server racks, to Linode to AWS to Azure), complaints about cloud infrastructure are proportional to managed service usage. I rarely hear teams that largely rely on virtual machines (perhaps with a managed RDBMS) complain. They do have to maintain a little extra scripting, but that's a minor inconvenience compared to battling issues and idiosyncrasies of managed services.

I'm sure it's gotten better now but back in 2016 provisioning VMs in Azure took so long that we joked that every time you provision an instance a Microsoft engineer gets in a car to buy the server.

Reminds me of how my Swiss bank doesn't support transfers outside business hours. I have to imagine when I click "send" in the UBS app, some guy named Hans-Ueli receives a tape printout and goes into the basement to move some silver pieces from one drawer to another.

If everyone in this thread shitting on Azure is going off how it worked in 2016 the comments here make a lot more sense. I know Microsoft bad still lingers in online communities but I have to say I’m surprised hackernews is still this anti Microsoft. In my experience, both Azure and AWS have their issues, it’s not like AWS is some perfect offering but you’d think that based on the comments.

I'm a little confused by this post. Obviously it's easier to maintain a plain VM than managed services. That's why people are paying a lot more money to the cloud providers for managed services, so they don't have to do it themselves. What you're saying is that this is essentially a pointless endeavor? I don't think this statement is entirely uncontroversial, since managed services are the main reason for many companies to migrate to cloud.

Using managed services is not a pointless endeavor – they can save you a lot of time (and therefore money).

Unless you need to switch providers, at which point it may take more time to adjust for differences in how those managed services operate.

Managed services are absolutely not the main reason for moving to the cloud. Companies do it for the flexibility that comes with renting the real estate/energy/hardware instead of owning it.


We use managed services, but only those that are managed versions of pre-existing software.

For example, we'll used Managed Postgres, but not Azure or AWS's home grown databases.

Makes migrating much easier.


Yes! The longer response is that the closer you stick to standards the easier of a time you will have. VMs are a standard with cloud-init and image formats, etc.

i.e. in 2025 managed Kubernetes is not _that_ different between providers


Heh, until you need to rollback a specific table in postgres using their backup solution. IIRC, this is possible in AWS -- or at least, I'm 99% sure you can at least download the backup. In Azure? All you can do is restore the entire database, and you cannot download it.

You can def download it

I mean, if it was solely about renting machines, we’d all just use DigitalOcean, or EC2 on AWS.

People use things like RDS and EKS/GKE to avoid all the administrative overhead that comes with running these things in prod. The database or its underlying hardware has a problem at 1am? It’s Amazon engineers getting paged, not you (hopefully… assuming the fault hasnt materialised to operational impact yet)


I've never seen a managed IaaS that saved time. It is marketed as something that can free you from hiring ops people, but you will absolutely need to hire some supplier relations people to deal with it. (And contract optimizers, and internal PR to deal with the fallout.)

It's different for fully featured SaaS. It's a matter of the abstracted complexity vs. interface complexity ratio that is so common for everything you do in software.


It's easier for the provider to maintain a VM provision. It's supposed to be easier for the customer to maintain managed services, but that's often debatable.

cloud services are great when they work. but in case something isn't you have no way to debug anything except maybe restarting the service if it's even possible.

we had one customer that needed IPsec tunel to vpc where production servers were living, we didn't want to maintain such setup just for single customer so we check Aws offerings. and look at that they have managed IPsec solution, great.

until client called that tunel is down and solution wa that they need to restart it on their end to resume connection. why? you can enable some logging to S3 but according to them everything should work. what we should do next?

but even if you stick to just ec2 thing can go weird. our recent incident: ec2 instance stopped responding but ASG didn't replace it, any action on it throwers error that instance is not running but it was in running state.


I wish I could better help my org see that. Luckily my boss agrees with me, but he's not in full control. Between the vendor lock-in, and the _almost but not quite api compatibility_ with OSS... I just dread as more teams adopt it.

"But it's easier!" ... yeah, we'll see...


Azure's anti competitive conduct is also the reason that AWS stopped lowering prices.

Before 2014 or so, AWS would periodically reduce prices on major services passing on falling technology costs.

Azure didn't like that, so they aligned their prices to AWS's, matching immediately the same discounts on the same service.

This is a form a predatory pricing, because the goal is to kill the incentive for competitors to reduce prices, by denying them market share gains when they do.


"Show us a better price and we'll match it" is not a new tactic nor exclusive to clouds.

We bought a company hosting on Azure. They used hosted Postgres and are hosting .NET services on Windows. Small infra, in range of 2-3 hundreds cores and 1T memory. Every few days M$ randomly shuts down random instances for maintenance, disconnects network for >10 minutes.

Migrated off hosted Postgres because performance was tragedy - now their India-based expert led us to use different volumes type and after instance restart database didn’t start up because of I/O latency. Expert don’t want to meet for 3 straight days now, because he is busy. RCA (half pager, written probably by some LLM) says it’s not their fault, but charts says different story.

The only thing they crash GCP and AWS with is dashboard that loads everything so quickly... sad you can’t run e.g. making 2 similar network operations in parallel because they will fail or take 10x the time they would take when run one after another.

Run, don’t use.


Of all the paas providers Azure have the worst abstractions and services.

In general I think it’s sad that most buy in to consuming these ”weird” services and that there’s jobs to be had as cloud architects and specialists. It feeds bad design and loose threads as partners have to be kept relevant.

This is my take on the whole enterprise IT field though!

At my little shop of 30 so developers, we inherited an Azure mess, built abstractions for the services we need in a more ”industry standard” way in our dev tooling, and moved to Hetzner after a couple of years.

A developer here knows no different, basically - our tooling deals with our workflows and service abstractions, and these shouldn’t change just because new provider.

1/10-th of the monthly bill, and money partly spent on building the best DX one can imagine.

Great trade-off, IMO!

Only two cases come to mind for using big cloud:

- really small scale: mvp style

- massive global distribution with elasticity requirements.

Two outliers looking at the vast majority of companies out there.


Oh boy, do I hate Azure with a passion.

I am part of a team building an automation tool for cloud provider creation of Projects/Accounts/Subscriptions (depending on provider). Our primary provider is GCP, and implementing that was fairly easy. Some gotchas, but easily surmountable.

Now we have gone multicloud to Azure and we need to add support (We were historically on AWS but moved 95% off, we still have some teams on there but we rarely build tools for it outside of Terraform modules). And the Azure API, MS Graph API and Go SDKs for bith are the biggest piles of trash I have ever worked with. Everything is a pointer, even string literals need to be made pointers, but sometimes they aren't....

Documentation is in accurate. Some apis take just the ID, others take a full path. Some of it is documented, many apis have the wrong one documented.

None of the APIs return related resources IDs, you have to search for all of it. So many name based searches. I had to add a caching layer for IDs during creation so I didn't have to lookup the same resource over and over (we use a state machine for creation and it can be resumed midway and other fun things, so we need a lot of checks and resume based code.

Overall it is the worst designed and implemented cloud provider. I would never recommend or choose it if given the power.


SDKs are generated but for some reason Go implementation sucks. Internal stuff written in Go is great usually.

I really wanted to like Azure because of how well it integrated with the rest of my tools, but I kept getting hit with VM availability limitations and UX quirks. I've never had issues getting machines in AWS, or feeling like my actions were taking effect.

I've also waffled several times on the Azure FaaS offering. I am now firmly and irrevocably at "Don't use it. Run away. Quickly.". The experience around Azure Functions is just too weird to get comfortable with. Getting at logs or other binary artifacts is a gigantic pain in the ass. Running a self-contained .NET build on a blank windows/linux VM is so easy it doesn't make sense to get in bed with all this extra complexity.


Ugh, yes. Lack if availability of resources in whichever region i happen to need them.

Also, things that break automation, like calling back to say your sql server is up and running when in fact it’s not ready for another 20 minutes. I am half sure the terraform time_sleep was written specifically to counter azure problems.


You missed the perfect middle ground between serverless and mouse-configuring an IIS VM: Azure App Services. It's the same service function apps are using once they advance beyond the trivial function and require longer runtimes or no spinup delay.

App Services takes some getting used to, but it's a locked down Win Server/IIS container with built in FTPS, self-healing healthcheck endpoints, deployment by pointing to a repository, auto-scaling options, and a 99.95 SLA.

A few years back, it was a bit of a dog performance-wise, but the modern CPUs have been no problem for a 2+ vCPU, Premium level SKU. Pricier than a VM, but dealing with security and updates for a webserver VM is a ton of work.


App Services can also run as a Linux container, should you not want Windows.

But the Windows containers have more features. I stuck with them for quite a number of websites.

Significantly cheaper than a VM as you noted just based on maintenance that would otherwise be required.


I believe any Azure user might be able to compile their 100 reasons to not use Azure, and the same will be true for most big pieces of software.

Even as someone that had minimal exposure to other clouds, I could easily see how Azure user experience lags due to the lack of proper care.

The amount of pages with a filter bar that will not work properly until you remember to click the load more should clearly be zero at this point, this is an objectively bad pattern that existed for years and should be "easy" to fix. But the issue will probably never be prioritized.

The fact is that unless tackling those issues are part of the organization core values or that they are clearly hitting their revenue stream, they won't be fixed. Publicity and visibility of those issues will always be crucial for the community of users.


I have significantly more experience in AWS, but I've spent equal time building and securing infrastructure in Azure for at least two years now. While AWS is not without it's rough edges, I'd pick it any day.

My number one concern with Azure is availability of resources. Working within US regions, we've had to shift regions during production rollout because one or more of the resources we needed -- a current gen Azure SQL database or App Service Plan -- were simply not available. Rolling out an inexpensive VM (think equivalent of a t3/t4g.micro) is always a ride too, between unavailable SKUs or excessive quota gatekeeping.

Spending gotchas exist on any cloud, but we also know someone who got caught off guard in a completely new way recently. In late-December, the team needed to automate a database event once per day on an Azure SQL instance. Scheduled jobs aren't natively available inside Azure SQL, and so they reached for an elastic job agent. Everything went smoothly until someone dug in to a price increase on the January bill and asked why Sentinel had jumped from under $200 to over $3,000.

A colleague and I helped them dig in and quickly discovered that the controller for the elastic job agent is running dozens of batches per second in order to schedule that one job per day. With default security audit settings on Sentinel to meet compliance obligations, this generates over 600GB of BATCH_COMPLETE log messages per month at a cost of $5/GB for ingest!


Vastly underrated cloud if you’re a small company and don’t operate containers is Cloudflare. I know they get criticized for other reasons but their DX is actually really great if you’re tired of the big 3 (4?)

IME even a small company runs into something you need an actual server for, and then you're suddenly spread across two clouds because Cloudflare is serverless or nothing.

yeah i don’t understand why clouflare isn’t competing head on with cloud providers. Imo they should acquire fly.io, give them a blank check and 2 years and I think they can take down aws.

The reason aws is dominant is because it’s the default and a known quantity. But developers and cost conscious organizations will look at alternatives. Not saying it would be easy but the prize would be huge. Plus AWS seems in chaos with a lack of sensible leadership.


Same, I see it as the single missing piece in the puzzle. If they had a VM product then for small businesses it'd be such an obvious choice, especially if they provide anywhere even just half the nice integration with their other products as they do with workers.

I'm guessing it's just because they're so extremely all-in on every single of their products being on the edge, and you can't make that work with VMs without becoming far more expensive than any of their customers would pay.

Or maybe I'm clueless. But it sure looks that way from the outside.


Can't wait to be able to run containers there. [1]

[1]: https://blog.cloudflare.com/container-platform-preview/


I can’t believe how unlucky Fly.io is. This is going to be the second time Cloudflare steals their lunch.

I moved off a fleet of VPSes onto Cloudflare Pages and subsequently back to VPSes again due to unpredictable latency, several cases of downtime in 12 months and weird bugs around static assets disappearing long before the advertised retention date for old deployments.

It’s a JavaScript/node-like environment only right now no? I love it for my svelte site but it’s a massive limitation to be locked in based on language and request-response based constraints right now.

You need at the very least containers and persistent volumes to be interesting to me at least.


JS, TS, Python, and Rust are first class options, plus generic wasm:

https://developers.cloudflare.com/workers/languages/


We ran our whole platform, written in Rust, in the Cloudflare Workers. It was not a great experience. You need to use their SDK, with really interesting bugs that never got fixed (we always forked a version). It was pretty hard to test anything locally, you just had to deploy your code to their platform, which took time and made the feedback loop to take so much time it blocked us from delivering features fast enough.

And yes, you can test your local Rust code. It works nicely on your machine, but breaks with a really nasty error on their platform.

The target is `wasm32-unknown-unknown`, which allows you to use `fetch` as your only source of IO. Ok, their workers has a hacky socket implementation nowadays. Non-standard of course. And most of the ecosystem won't work really without forking everything and fixing the bugs by yourself.

We pivoted to a native Rust project. We still have one worker running in Cloudflare. We isolated that code from the workspace so that renovate updates will not touch it. You know, a random version upgrade might break the service...


Interesting choice to support Rust over Go if you ask me. Don't have numbers but I don't really peg Rust as a popular language for serverless web apps, certainly not to the extent of Go

Non-JS workers run via WASM, and Rust had WASM support before Go.

> and Rust are first class options, plus generic wasm

Rust is still a wasm target. Not everything easily works. It doesn’t have all the Cloudflare sdk features js/ts has either.


All of the JS features are available in Rust, but some don’t have a first-class SDK API yet and you must use wasm-bindgen.

I love Workers but they are not a panacea.

I work in both AWS and Azure and let me tell you, one thing I absolutely love about Azure is their portal. It’s like AWS 2.0 where all the cloud cruft is abstracted away and all that is left is the knobs you actually need to turn, and how they relate to one another.

I love me some AWS, but my god every time I have to dive into an unfamiliar environment and try and reverse engineer how everything connects - I need a drink afterwards.


I’m having a really hard time believing you are serious. The one time I tried out azure for a few days the portal was absolutely painful. Every click would take 5-10 seconds for a response. Sometimes basic settings change actions would take 2+ minutes of watching an Ajax spinner. How can anyone enjoy working like that???

Sure, the UI is sluggish, but at least you don't have to move through three different "services" to find the routing table that your VM is using.

AWS UIs are generally snappy and smartly designed individually, but they are horrendously organized at the general level. AWS is built as if you are exploring a relational DB containing your resources, instead of a deployment tree.

Your VM doesn't have a NIC in AWS, it has a foreign key to your entire VPC's NIC table, which lives in the VPC service, not the EC2 service. And then your NIC doesn't have an associated subnet, it has a foreign key to the subnet. And then when you get to the subnet table, you look up the routing tables table, and finally in the routing tables table, you'll find the settings for the routing table. This all works through following links, but the constant context switching and tabs upon tabs that AWS UI requires are extremely unpleasant for me at least to use. I'll take Azure's sluggish UI that organizes all of this in one-two pages instead of four any day.


AWS pages are built by different teams, and it shows. We're all supposed to use IAC though, right?

In all seriousness, even in the face of IAC, the one thing Azure can do that AWS can't [at the time this happened to me], is have a global view of everything that's running right now and costing me money. It was years back, it was a $5 bill, but the principle of it had me livid. I did my best to tear down everything after my evaluation, yet something was squirreled away costing money.

So yeah, absolutely, sluggish UI all the way (I also find the Amazon storefront profoundly ugly and disorganized).


For a view of everything that’s costing you money, you can just look at your detailed billing data. That’s been available for at least 13 years.

You using terraform state to look at route table entries?

ENIs are under ec2 in the console, not VPC, on API/CLI they're all under ec2 together with all networking.

If you click an instance and go to its networking tab you get a list of ENI IDs that are clickable links to the resource, same for vpc and subnet. If you click subnet you can just click the route table tab, so if you're on an instances networking tab the route table is 2 clicks away.

But rather than doing this you could use reachability analyzer that allows you to check routing tables and security groups for a source and destination IP/resource and port on same or different VPCs connected with peering or TGW and it will tell you if you're missing routes or SG rules in either direction. I created a slackbot that allowed our devs to input src/dst IP/domain and port an that used this API to do the check for them, saved a lot of time troubleshooting.

I had an absolutely horrendous time working in Azure a few years ago (as a network engineer), we did have quite a complex setup with custom route tables and Azure Firewall though and VPN connectivity between Azure and AWS, but stuff like their VPN gateway taking 40+ minutes to change instance size on, wtf? I've filed 2-3 bugs to AWS in the almost 10 years I've worked with it, all for newly created APIs/services, they were all fixed within a week or two. I filed 8+ bugs to Azure in the first month using them, none of them were fixed as they had workarounds instead. And their documentation is absolutely useless, I could never trust that I understood what I read correctly, I always had to verify that it worked that way by testing it.


God, you never tried Oracle Cloud. It's not an UI, it's an escape room. I would pay to switch to Azure.

must be a paid troll, no self respecting intelligent engineer would find the Azure portal good. it’s horrible ux, really convoluted and complicated, very unintuitive, horizontal scroll is a joke when the web scrolls vertically, tiny fonts making everything hard to read and screens overloaded with so much shit and yet they managed to not put on the screen the main thing that developers would care about. it’s a complete joke

> I’m having a really hard time believing you are serious

and

> The one time I tried out azure for a few days the portal was absolutely painful.

conflict with each other. Here's what you sound like:

"I don't believe you because I have very little experience in something and it doesn't comport with that."


And most links can't be opened in a new window, you ant right click on them and middle click don't work!!

My god, I've been sending feedback about this shit for 3 or so years. You can't open any-fucking-thing in a new tab. The funny thing is, it used to work and they fucked that up.

I don't understand how dumb you must be to design a web site that way. It's like a brewery that sells their beer in plastic shopping bags and thinks that's good.


I think OP is trying to differentiate between Azure APIs, which are unbelievably slow and horrible, and the UI design itself - the layout, the font, how one screen will flow to another screen, what links to what, how it would be laid out in a tool like Figma.

Azure's APIs are atrociously slow. Azure's UI design is pretty nice. There's not much the UI designers can do about their API colleagues.


I know we're talking about AWS and Azure here, but had to add that fwiw, the M365 admin interface(s) are so bad it practically feels like a prank. In other words, it's as though someone is purposely making them as chaotic as possible to what end I can't even guess.

Add Intune to the list of bad MS dashboards…

I think it was the InTune interface I was in the other day that had the same link underneath 4 different sections of the dashboard, which I noticed when I had all 4 of them expanded at once? That got a good laugh out of me.

"Here...don't miss this settings page! Seriously! Look!"


Can anybody name a Microsoft interface with good UX? (Yes, the grandparent liked the Azure homepage, but that was also disputed by several others)

Does Microsoft Mouse count?

Not familiar with the product. The MS name alone would make me biased. Is it really good or even better in some way? Or did you just slightly ironically mean they did not manage to make it worse than competing products?

It was really good. I think Microsoft had numerous great mice, atypically for Microsoft: https://vswitchzero.com/2018/03/09/unboxing-a-22-year-old-mi...

A great thing about mice is that they are fungible and don't change without the user's consent, unlike software, so you can keep buying and using the same mouse forever.

The average mouse of the time was blocky and uncomfortable.


This comment reads like rage bait (I am not saying your opinion is "invalid" or you're lying). I've never meant anyone who likes the Azure portal lol, even people who live inside of the Azure ecosystem hour by hour.

The Azure portal has some nice ideas - in theory, being able to divide stuff into "resource groups" works a lot better than the AWS approach of "divide resources that should be isolated from each other into separate sub-accounts".

In practise, even the good ideas are implemented poorly.


Genuine question: is your comment satire? If not, that's hilarious, I find myself on the completely opposite end of the spectrum. To each their own!

Seriously, I absolutely can't stand the azure ui. I don't think the AWS console is great, but it is definitely better than the Azure one to me.

At least the Azure one lets you see all your resources, so you can verify you don't have something you're paying for unexpectedly.

That would be good if I could acquire resources in the first place. Even Oracle Cloud makes more sense (if you've ever heard of it)

Oracle Cloud was (and possibly still is) deleting random accounts (it was all over their subreddit at the time).

I wouldn't have believed it, but while testing out a server for a business, they deleted my account, and didn't reply when I emailed support about it.


Oracle did not even accept my credit card (the same I pay AWS and Scaleways with). So I missed that part of the experience.

That would have been nice, AWS kept sending me bills for $0.00, and after multiple tickets over a couple years, I finally deleted my entire account due to how pathetic their support was (they never figured out which service was active, and I couldn't find a way to figure it out using their UI).

AWS is actually great once you spend few dozen hours in the service. If you are using the service for the first time GCP feels a lot smoother but then you begin to see corner cases and everything and GCP just breaks in those. Azure is bad the first time and gets worse over time.

My experience too, I could never find anything even if I know it's there, and I was told by my boss to use it. I stand there for a half hour, credit card in hand, then go to AWS where the equivalent can be located by mortals. It's like they don't want anyone's business.

Yeah I don’t get this thread at all. I’ve used both fairly extensively and while Azure’s dashboard is still a pain in the ass, it’s better than AWS by a mile usability wise. Not to mention, Microsoft clearly puts time and money into their documentation, while AWS docs have always sucked.

Most of my complaints of Azure come down to the UI. So many head scratching moments and if you don't have a 4k monitor lot's of scrolling of menus inside of menus

How do you find the price difference? Whenever I've done comparisons, they have always worked out significantly more expensive than AWS, and AWS is already pretty damn expensive for anything that requires a decent amount of compute.

I’m in the same boat. To me it’s bad UX. I can never find anything and it just looks way too complex to use. It shouldn’t be this way but as someone that uses Microsoft Entra, I guess I’m not surprised.

Don’t most developers have ultrawide 4/8k monitors these days though?

Are you an Azure UI dev?

Not on laptops, which you'll be using if you are on call. It's even worse if you don't the eyes of a 20 something year old and have the text scaled up a bit.

"Most developers"

The world is a big place


Even when I have I often just use my 14" laptop screen when I work. If that's not enough then you have a shitty UI.

As Azure Ops person spearheading migration off AWS and another cloud, It's pretty funny. Some of it nitpicky, other is sharp edges I've cut myself on and some is Microsoft refuses to update required TF version because they are bending over backwards for compatibility reasons which is beyond frustrating.

However, all the comments about the Portal are baffling to me. AWS portal is just all over the place, I feel like people are expecting AWS awfulness and when portal wants to be consistent, it's breaking people brains.

Oh yea, Day 313 with Public IP to put into DNS. Alias record that you noob. :P


For what it’s worth, I’m a student, and have had the benefit of seeing both the AWS and Azure web interfaces for the first time in the past couple of years. Azure was astoundingly more intuitive and less bizarre than AWS as someone with no experience working with the big clouds.

Even doing classwork involving AWS was an exercise in frustration. I couldn’t actually believe the sort of button trails and on-hover menus I was told to use to access various functions.

I don’t have the experience to evaluate the technical functionality, or whether AWS’s interface is better for experienced users, but I can definitely say it was far less approachable as a novice.


Yeah I don’t get this thread at all, AWS is a usability nightmare, as are all Amazon products. Microsoft products aren’t great usability wise but they’re clearly better than Amazon imo. I have a feeling a lot of the commenters here last used Azure when it was still in beta and the dashboard was live tiles Win 10 style.

Last I tried to use Azure; it did not even offer domain registration. This was many years after launch - and not too long ago.

Not sure if this was an Australia/Oceania limitation - or just an ongoing product limitation.

My requirements weren’t complex. I needed to manage my domains (not AD), spin up virtual machines, and associate the two.

I also found the UI, overall; tedious. Finding the right offering under their ambiguously named services was difficult. And this comes from an AWS user.

I wanted to like Azure, but for very least reasons above; it’s not the product for me.


Did you want domain registration, or just DNS management? Those are two very different services. They offer the latter but not the former. So while you (generally) have to buy domains elsewhere, you can then manage them entirely within Azure after doing so.

They also offer the former. It is called App Domains. (As Domains are AD domains)

Ive been using it for 3 years now to register names for my job as Azure bill just gets paid. Anything outside Azure is a bureaucratic mess.


I legitimately hate Azure and I don't normally hate software.

Im an engineer working on Azure(mainly AKS related) ill try to get some of these issues attention and hopefully get them fixed asap.

Could you please address our ICMs and fix those issues already?

For me, Azure is very good at using it to do SAML/OAuth/OIDC for in-house and 3rd party applications. It works wonders and is very cheap. I mean, I think it is the best IdP out there in all SaaS offerings on the market.

The cloud part (VMs, k8s etc.) is something that I touch if I am being forced to. Even creating a VM is way more complicated than it should be.


I'm curious why people pick Azure, if anyone here has direct experience with making the decision.

I work at a startup that runs on Azure, and we're only here because of Microsoft's monopolistic behavior. We switched because Microsoft gives Office 365 discounts to our customers as long as all the the SaaS services they use are hosted on Azure, and so our customers demanded we use Azure. Part of the monopoly playbook: "using a monopoly in one area to create a monopoly in another".

I used to work at GCP, and I thought it was almost shameful that we were in 3rd place behind Azure. Now it just makes me mad (especially since I had to migrate our startup from GCP to Azure).


Pretty similar reason, customers use Azure so the incentives are in place to run more things on Azure.

Case in point at work: we need to set up Azure infrastructure per-customer. Hitting the Azure RM endpoint from outside the Azure network is not reliable; the API endpoint's DNS record points to one of two IP addresses in westus, and when the DNS record flips (presumably for blue/green deployments) the no-longer-referenced IP address immediately aborts the connection. The official Azure Terraform provider throws an error when this happens and it usually results in Terraform state losing track of something that it already created. Azure support just says "well all we see is 200 OK from our side".

The "solution" is to run the Terraform workload from within Azure. The SLA is only really guaranteed if you're connecting to the Azure RM API from within Azure. Cue the insanity.


In my case, the company I work for isn’t a software company. The bean counters / IT group would rather just have something tacked onto their existing Microsoft subscription vs. something entirely different.

Also, I suspect part of the reason people are hesitant to use GCP because Google is perceived as a company that will gladly kill products off on a whim. Not great for something mission-critical.


From what I've seen, when people pick Azure, it's always about money.

Whereas with other providers, it can also be about money (because they offer a big discount or because the migrating partner is cheaper), but it can also be about wanting one service/feature that only X is providing; and once you're in, people tend to prefer to put everything there, instead of doing poly-cloud.

In my region, Azure salesmen are very active, providing huge discounts, so Azure is the most popular amongst big companies. Meanwhile smaller ones will go on AWS because its easier to find information and (actual) knowledgeable people.

I used to work in a company using AWS : everything was managed through Terraform and we were as cloud agnostic as possible (mostly containers). Then we were acquired by a bigger company with a Azure deal, so they told us to migrate from AWS to Azure. They provided us with their own experts to help us, but six months in, we were still unable to have anything remotely viable for UAT. The experts were starting to acknowledge that even with they years of experience, they still weren't convinced with this whole Azure stuff so they actually relied heavily on a legacy on-prem DC. That's when I left. Last time I heard about old coworkers, the product was still running perfectly fine on AWS, while there was still a team working on the migration. It's been more than two years now.

And I had other bad experiences with Azure. I know that cloud providers are not fun if you don't start with two weeks of training, so I try to stay open-minded, but no matter how many Azure experts I talked to, I never found one who was actually confident in using it.


Here:

  - we use mainly GCP
  - we do not want to use AWS because of random political issue (absurd, in my opinion but whatever)
  - we are raped by GCP and would like an alternative to help keeping the price "acceptable"
Welcome azure ..

Amazon competitors have strict policies against using SaaS products hosted on AWS and they generally default to Azure since its MS. In EMEA its GCP.

How does this work? Do they demand you to use Azure for your servers so they get a discount? Or do you have to create instances of your product in e.g. VMs that are put in their Azure account? Did you have to completeley leave every last bit of gcp behind? How is this checked by MS?

What you call “monopolistic behavior” a senior/lead engineer with a brain will see as ecosystem compatibility. If you are working within an org that already uses Microsoft for everything, why the hell would you bother introducing a new stack everyone else will have to learn rather than use the Microsoft offering if it’s competent enough? On top of that, the Microsoft product will most likely work nicer with other MS products. Same reason people buy iPhones and Macs and stay in the apple ecosystem. Yeah it’s not as hip and exciting but enterprise development is rarely hip or exciting. At a startup not already using MS products, yeah no shit you can use whatever you want with little to no considerations for compatibility within the stack, especially when your main goal is cost savings.

I belive the pricing is cheaper that is why some companies go with Azure.

400 reasons is impressive but the real test is what number could you get up to compiling a similar list for GCP or AWS.

At my previous company, we lost countless hours troubleshooting a Dockerfile that worked everywhere except on Azure. It used Node 18 as the base image, and the solution ended up being a chown 0:0 on everything inside the container— which took an absurd amount of time during every deployment.

Yes I believe Docker/AKS somewhat recently defaulted to least privelege for container users, so you end up having to explicitly grant access to every little thing...

I am constantly forced to use Azure by idiotic companies which use .NET and the entire .NET mono culture which fetishises Azure and I can say with clear conscience that Azure is the shittiest dumbest most ill engineered clusterfuck of a cloud that has ever been unleashed on developers. It’s so bad that in the last 5 years even some of the most die-hard C# shops in the UK have changed their leadership and started to move away from Azure because they cannot afford to ignore the absolute insane state of it. Literally at every junction where Microsoft could have gone with a feature in Azure one way or another they somehow managed to not only pick the worse of the two, they somehow managed to bastardise it even more beyond anyone’s imagination.

hello dustin, back to your anti-.NET crusade even if there's literally nothing in .NET that requires you to touch Azure?

Would you tell us how the monoculture you speak of (which, for some reason, uses AWS, K8S, Postgres, Mongo, etc.) hurt you?


I’m running two k8s clusters with 6vcpus and 12 gb of memory each and when we got the quote from Azure it wasn’t pretty. The managed pg was a deal breaker for us. P.s.: side gig with 1qps on eu cluster and way less in US. P.P.S.: happy Digital Ocean customer

i think azure is fine. it wouldn’t be my first choice and they have some questionable choices but its possible to circumvent them.

Sometimes I create a webservice for fun in a single PHP file, using PDO to connect to a database and then upload it using SFTP to my server. It scales pretty well unless I‘m becoming the next largest tech company in the world.

Lol. I would follow an account like this for something I use a lot. Like C# as a language, Unity, hell, even Chrome. Has the energy of the "Linux Sucks" talks. Usually the undercurrent is that something that can have so many reported faults is something worth having a love-hate relationship with :D

That said, I only use Azure for redundancy. Hosting one app I have there on anything but Azure would be pretty much infinitely cheaper, especially when my IO quota goes above ~1GB a month which with azure causes a need to instantly change to a 20$ plan for the month.


Does Azure's version of Github, Azure DevOps (ADO), still lack the ability to search pull requests?

it doesn't even support ed25519 ssh keys

It doesn't have any kind of commit metrics and only supports rsa keys.

Ah, most of b-guys are uncompetitive and make money by just being in already lucrative businesses. That's why there are solutions with Microsoft technologies.

Its hard to believe people got paid to build that hot garbage fire.

They built azure? I thought it's all just SharePoint :)

Azure runs the enterprise. AWS runs the Internet.

Oh man, how many hours were lost because a managed identity could not be retrieved (400 Bad Request)…

I quite like Azure. Microsoft was also way more responsive than other cloud providers when we were looking to shift providers, we got free 3rd party consultation to setup our infrastructure in azure, it all ended up being more for less for us. It's all setup using infrastructure as code which is pretty maintainable and easy to add new stuff. Almost everything can be done via command line. Don't really use the portal UI to set anything up, but we do use it to look at the state of things. Haven't really had any problems with any of the services.

Only 400?

Scrolling through I don’t see any good reason but bunch of nitpicks.

Yeah you can die of thousands cuts but I don’t believe you don’t get the same on GCP or AWS.


Cloud computing is a never ending shit show no matter whose cloud you are on.

Except for OCI. Holy shit they are the worst.


Is there somebody working with Azure devops? How do you find it?

I find it terrible, but it's what we have.

Boards works for basic Kanban projects, but if you want to dwell into scrum stuff like sprints, burndown charts, etc, it's very bad and cumbersome. You'll have to do a lot of stuff manually that Jira does automatically.

Wiki... It's not good. It's extremely slow, and lacks a ton of features that Confluence has.

Pipelines is godawful, and has been suffering severely from a migration from their "Classic" pipelines to YAML. The funny part is that if you go in depth into YAML pipelines, you'll notice there's a very large amount of things that aren't configurable by YAML. Also has a ton of bugs, many of which have been open for over 5 years. To make matters worse, it's currently in an identity crisis with Github Actions (which has more features and is continuously getting them over Pipelines).

I don't know what's the future of Azure DevOps, honestly I feel like they'll eventually shutter it and move everyone to Github Enterprise.


Azure Devops contains lot of things like Jira and Azure Pipelines etc. The Jira equivalent interface is confusing but its not a showstopper, you learn to live with it.

We use it daily. The other option we had was a combination of self-hosted github (which at first didnt have actions), jira, and confluence. When actions was nog available, ADO was used for pipelines, so that was 4 services.

Give me 1 integrated service built with the same stack as Azure anyday. Builtin service connections, managed identities, etc.


I recently had to work with it and found it to be quite alright. Although my use cases didn't go beyond creating simple pipelines as examples.

Way better than anything made by Atlassian, but that's not a high bar.

It's serviceable...? They have their own custom languages for deployment which is disgusting but seems to be the norm

It works, but as with all Microsoft products it's in desperate need of some love and polish


Been using Azure for a few months now, mainly the AI part (AI foundry, AI search), and it feels like a product ran by juniors with no guidance. One day, the entire PromptFlow reaction was down, no status, no info.

To create a new deployment (which is basically a PAYG model that doesn’t really require this limitation), you’d have to switch to the old UI to find the button to create one.

I agree with comments about cloud providers, most applications would be better hosted on VPS or services like Digital Ocean. But hey, software developers like to look smart by complicating things


And every single one is "Microsoft".

As a consultant who makes most of their money off of Microsoft technologies, I'm finding more and more reasons NOT to use Azure.

Chiefly among them is their famously bad support. Just google it- I've never spoken with anyone or seen a single written word saying that Azure support is even decent.

It's a race to the bottom platform, and I'm starting to get to the point where I want to start selling AWS.


> their famously bad support

I cannot count the number of times I've found a Microsoft support forum question that's exactly my problem too and the official tagged Microsoft support person fully misunderstands the question and then doesn't even properly answer their own misunderstood question


I see this title and all I can think is, "There's only 400?" I am not impressed with Azure, I wish my company was using AWS, everything about AWS was much more reliable. The Azure Portal is not to be trusted, it can just lie to you at times.

Related to this, I find the Azure Subreddit terrible. Each critical question is punished with downvotes.

It's like this in a lot of subreddits dedicated to a particular product. The regulars are die-hard $product fans, and respond to perceived negativity just as you'd expect.

You should try /r/macos! ;-)

I've seen the 0 vote questions throughout many product and non-product specific subreddits. Bots or just reddit vote fuzzing?


Just earlier this week it was charging me for some mysterious "api management service" --- it was such a pain in the ass to cancel. I had no idea what it was about. I contacted support, got them to supposedly reverse the charges and reported the credit card as lost before the charges went through just in case. I just wanted a damn api key and couldn't for the life of me figure out how to find it (I work at a cloud computing company, I do this stuff for a living and I still couldn't figure it out).

It's an obtusely confusing interface with opaque pricing that you can just magically sign up for with innocent sounding names. It's like the dark pattern people from Intuit came on over for a house party and got drunk one night.


The god awful UI is enough of a reason alone. It's EXTREMELY Microsoft.

"Azure’s Security Vulnerabilities Are Out of Control" - https://www.lastweekinaws.com/blog/azures_vulnerabilities_ar...

The great thing about Azure is not the security it's the reports about how you are secure. The later is legally required the former is only visible experts.

(2022)


Not 404 reasons?

microsoft cloud is hosting/protecting stretchoid.com which I get scan/(hack attempts?) all the time. I am self-hosted, and those are a pain. As far as I know stretchoid.com is not selling the scan data...

I plan to drop IPv4 and go nodns/IPv6(/64 or /96 prefix) for self hosting.


The Microsoft Graph SDK is the worst piece of shit I ever saw. The ONLY actual good part is the JS SDK. (I know right?!) Aside from the horrible DX, It Is FULL of bugs.

I was tasked with a project where it made sense to use Azure Durable Functions. Again... BUGS ... I reported a couple of them and even went and spoke with the product team about those. One BUG was due to a misunderstanding of how the framework works (in my defense, the documentation was very unclear) and the rest of the bugs are still not fixed almost 2 years later.

I decided to fail the project and restart with a different approach and framework.


I just have 1 reason: it is too expensive. Anything that Azure offers can be found much more cheaper somewhere else.

I'm tired of Microsoft existing.

Working on enterprise or higher level Microsoft is a way to get Grey hair fast. All the way back to Server 2003, we had the infuriating inconsistency of group policy, roaming profiles, DFS drives. Everything is full of errors, you will have a larger IT team as a result to deal with the headaches.

After using Google Workspace for IT, and AWS for infra, I always tell people to stay far away from Microsoft.

Even now, I have a friend who can't honestly deploy intune, because of inconsistencies of the "type" of enrollment, and it being able to execute a winget script as a result. Despite both machines enrolled in intune, the one that was enrolled during OOBE can run the scripts, but the machine enrolled in the OS cannot. Microsoft support has had that ticket for weeks.


Meh, as a developer who lived through the 2000-2010 era Microsoft, it’s easy to come up with a laundry list of reasons to hate on Azure.

But I tried Azure for my most recent startup because I was offended by AWS, and GCP did not have enough adoption among my customers, and Azure worked - fine.

What do you really need out of a cloud?

I want them to rent me VMs, for them to not go down, and to make it easy to do standard stuff like an object store, run containers, run databases, etc.

Azure was as good or better than AWS


I have used all the platforms personally and professionally. GCP, AWS, Azure, Oracle Cloud.

I will just say that Azure seems to want to do shit different for the sake of being different.

It is really annoying to write infrastructure as code for aws/gcp then go to do the same for azure and realize how dumb some of their stuff is.

Just my personal experience.


That’s how clouds try to lock you in, by making you use a custom tool that is different for the sake of being different.

If you use standard tools you don’t have this problem.

Containers running on VMs is standard.

A mesh of microservices that depend on cloud queues and managed services is not.

One argument against standard containers is saving dev time. You can still save dev time by using standard open source software. How many different ways are there to implement a queue or a load balancer?

If you really need access to some proprietary technology then by all means use the cloud that offers it. Eg if your customer demands GPT4.5, then go with Azure.

But if you need something standard, don’t get caught in the trap.


I am an older guy that was building kubernetes clusters before eks, aks, gke. So I used terraform to build shit out to make it happen. Azure was 5x the code just to be different. You can try to blame terraform but if you used MS custom tooling it was no different.

What about the way Terraform is a 3rd class citizen on Azure? And there are multiple ever-changing ways of doing everything, major parameters aren't supported, etc. It just makes it more difficult to deal with.

Also, Azure APIs are incredibly slow.


It’s Bicep

I'd rather avoid the trap. I use the 3 major CSPs, so I would prefer to use cross-platform tooling.

Per the parent:

>That’s how clouds try to lock you in, by making you use a custom tool that is different for the sake of being different.

> If you use standard tools you don’t have this problem.


> I will just say that Azure seems to want to do shit different for the sake of being different.

That's Microsoft's MO in a nutshell in my experience, and I say this as a recent(~5yrs ago) convert to Linux who built a career on Windows endpoints, servers, ADDS, Exchange, SCCM, you name it. It's how they achieve lock-in to their ecosystem, and it's incredibly frustrating to see how they've just layered that method of operation over and over again, decade after decade, rather than fix anything.


Fixing things is hard. Papering thing over with free Azure credits and marketing? This is the way.

Conversely, doing things "the same way" as AWS would mean copying their first-generation public cloud design flaws.

The overall UX of AWS is absolutely crazy. It's easy to "lose" a resource... in there... somewhere... in one of the many portals, in some region, costing you money! Meanwhile, Azure shows you a single pane of glass across all resource types in all regions. It's also fairly trivial to query across all subscriptions (equivalent to AWS accounts).

Similarly, AWS insists on peppering their UI with random-looking internal identifiers. These are meaningless and not sortable in any useful way.

Azure in comparison allows users to specify grouping by "english" resource group names and then resources within them also have user-specified names. The only random identifiers are the Subscription GUIDs, but even those have user-assignable display name aliases.

The unified Portal and scripting experience of Azure Resource Manager is a true "second generation" public cloud, and is much closer to the experience of using Kubernetes, which is also a "second gen" system developed out of Borg. E.g.: In K8s you also get a single-pane-of glass, human-named namespaces (=resource groups), human-named workloads, etc...


A single pane of glass that shows all your resources that are currently choking due to hidden limitations is no flex over AWS. It is my hope I never have to use Azure ever again professionally or otherwise

this killed me. i dont think AWS does everything well... but Azure went hard on being different just to be different..

like region names...


> What do you really need out of a cloud?

I need a cloud to be reliable and secure. I've used Azure extensively and it's neither. I'll take GCP or AWS over Azure any day.


This. We've used GCP Appengine for years and it is rock solid. Their SRE game is top level, and when there is an outage, they do a serious investigation and make it fully public, even if they screwed up badly. Including the vital "this is how we're going to stop this ever happening again". The last outage (that we noticed) was several years ago.

Azure tend to bite back when they upgrade their backend and everything breaks

Happened twice with my kubernetes deportment, first something with node groups made them incompatible and had to recreate the cluster from scratch, then one of their scripts to rotate key access to volumes (that one has to run manually, go figure) stopped working and caused my volumes to detach from pods, and has to recreate the cluster again, and I just give up.

I was super happy as well, the first two years. By years six I was fully migrated out.


The UX not being designed by business majors.

You won't find that in major cloud providers.

I need it to be sane, to work and to be reasonably well documented.

Azure ostensibly fails outright on points 1 and 3, and limps by on 2.

The products are confusing mess, there’s way too many ways to auth things, docs are garbage, tried multiple times to manage stuff via Terraform which broke far too much to be excusable, to say nothing of the dumpster fire that is their UX.

I’m sure some people have either beaten it into submission, or have stockholmed themselves into putting it up with it.


Why were you offended by AWS?

I got into a fight with them kind of like Trump and Zelensky just did. Not a technical reason.

Sometimes in business the deal falls through

I was on the receiving end and didn’t appreciate it.


Asking: Okay, clearly there are a lot of people here with lots of experience running their software on cloud services from Microsoft, Amazon, Google, etc. Good.

But what about a solo founder running their Web site on a "full tower" computer they plugged together themselves?

So, why use a cloud server farm with its expense and complexity?

Or, get a mother board, a processor with 16 cores and a 4+ GHz clock, 128 GB of main memory, some rotating and/or solid state disks for a total of 20 TB or so, some external disks for backup, a recent copy of Windows Server, applications software from .NET, and a 1 Gbps Internet connection?? The computer -- tower case, power supply, motherboard, processor, disk, solid state disks, and Windows Server -- costs ~$3000?

So, a 1 Gbps Internet connection, ~$100 a month, would have capacity of, say, 100 MBps. If sending a Web page with 200 KB, then the peak capacity would be

     100 MBps / 200 KB = 500 pages/second.
Then 500 pages a second with 5 ads per page with revenue of, say, $2 per thousand ads sent (CPM), that would be

     500 * 5 * 2 / 1000 = $5/second
at peak capacity or maybe an average of half that for $2.50/second.

Then at 16 hours a day that would be revenue of

     2.50 * 60 * 16 * 30 = $72,000/month
For the electric power, at 200 W and $0.10/KWh, that would be

     200 * 24 * 30 * 0.10 / 1000 = $14.40/month
How many users?

With peak capacity of 500 pages a second and average of half that, 250 pages a second, for 16 hours a day, that would be

     250 * 60 * 16 * 30 = 7,200,000
pages a month. If on average send 5 pages per user, that would be

     7,200,000 / 5 = 1,440,000
users per month.

If users come on average 2 times a week that would be 8 times a month or

     1,440,000 / 8 = 180,000 users
from one tower case and some Web page software.

So, why use a cloud server farm with its expense and complexity?


Redundancy and failover capability, mostly (in terms of everything.... power, network, storage, compute.) For a hobby project, what you describe is probably fine. For a real business, do you want to tell customers you're down because your one computer shit the bed and you have to run to Best Buy to get a new motherboard?

> For a hobby project, what you describe is probably fine.

Thanks! Yup, it will be "a hobby project" until, if ever, it gets some users and revenue. Then, have several servers, some load balancing and redundancy, uninterruptible power with a generator outdoors on a concrete slab, contact Cloudflair or some such and have them do what is needed, etc.

Just looked up SSL, reverse proxies, firewalls, etc. Okay.


If you want an RP, check out HA Proxy.

If you're virtualized on your host, 2x HAProxy on top of OpenBSD utilizing carp. It's great fun to set up and run -- and once you have it running, it's stupid stable. Very little maintenance required.

You can do this with Linux & keepalived, as well.


Your local site can't implement DDoS protection though, you have to buy that from cloudflare or some other reverse proxy anyway, so cloud is always in your future in some form. Also your local site can't move to Australia or Taiwan or wherever when you realize your user base is more global than you thought.

I mean, your intuition is correct that "cloud" is mostly just a bunch of boring, standard computers running boring standard software and there isn't anything they do that you can't.

But at the same time, boring standard software is (by definition!) commoditized and if you're spinning up some new and interesting thing, it's only going to be differentiated by the parts that are not boring and standard. So put the boring standard stuff on a credit card and do the interesting stuff instead.


If you have the skills, and the bottom line is still positive (include opportunity costs and personnel costs, ease of getting SOC2 / ISO certification if that's relevant to you, ease of scaling up and down), then you should go for other solutions.

I advice CEOs of SMEs on this, and I can tell you that the main concern they have is availability of people to build and maintain the systems. Because cloud / k8s is more popular these days, that's what they go for. If we could reliably find smart system operators that will happily maintain a couple of racks of servers for years, it would be a more viable option.


Because backups.

Because the constant attacks you’ll be under from china/russia that start 5 minutes after your servers are live.

Because as a solo founder you should be hammering on new features not putting out sysadmin fires.

I am happy to give AWS $200 a month to look after all that crap.


Just disable ssh password access, use ssh keys and change your default ssh port. 99% of those attacks will go away.

If your site isn't static, you need a Web App Firewall. CN/RU will probe the website all day looking for ways in.

Where’s the firewall? Reverse proxy? SSL certificate management? High availability? Patch management? SCM? Central logging and alerting?

As someone who ran many IIS boxes since IIS 4, I greatly prefer Azure websites over having to worry about all the “other stuff” that comes with onprem. Yes, the cloud needs an RP, WAF, etc, but they’re always HA and simple services, not another box to maintain.


Careful, anytime the topic of self-hosting comes up, you will see a bunch of engineers crawl out of the woodwork to insist only the cloud can handle the complexity.

Because there was literally no internet before AWS. It just didn't exist. It's a story we made up.

Because you might need 20,000 of those "servers" immediately, and not have the up front capital to for an investment like that. And maybe it doesn't work out and you just needed those servers for 2yrs vs $your_depreciation rate

And you'd need about 19,997 less of those servers if you got rid of all the scummy adtech infrastructure you're implementing and just focused on the core product. Unless your core product is ads and marketing data, in which case boil those oceans so that you'll be able to show mattress ads to someone that's 0.3% more likely to be influenced to buy a new mattress when they buy a pack of gum if they're running dark mode and have a Mac but use an Android phone and are located within 217 miles of Arkabutla, but only if it's raining.

I don't see colocation charges in there, unless you were planning on running your 'server' out of your house on a residential internet connection (that probably has restrictions on acceptable use.)

By the time you spec out a real server (redundant power, higher quality components than Newegg stuff), rent some space in a rack, pay for bandwidth (50Mbps is going to be about it without paying a premium), you're going to be looking at $5000 + $300/m. All that effort, whereas you could spin up something in the cloud for a bit more per month.

This does flip quickly, however. Once you get into the high 5-figure monthly spend, running your own hardware makes sense again. DHH's blog posts on 'Leaving the Cloud' are a great read.


> you're going to be looking at $5000 + $300/m

I looked at Hetzner's $150/m for dedicated Xeon server with 1Gbps.


If your stuff can fit on a single server at home, and are comfortable managing it, by all means do! It’s definitely way WAY cheaper, and if budget matters, that’s great. Nothing wrong with that IMHO. Obviously you can’t 100x overnight, but that’s realistically not gonna happen. And if it does, then you can start to migrate, which probably won’t be that hard, because it’s just impossible to make a single machine anywhere near as complicated as a cloud setup.

For small orgs: global reach!

If you need an active component near your customers for low latency responses, the cloud makes it very cheap to deploy tiny VMs or small containers all over the place. It’s trivial to template this out, scale up and down for follow-the-sun or to account for local traffic spikes.

If you need it, you need it, and nothing else meets this need except perhaps some CDNs with “edge compute” capabilities — however those are quite limited.


I can’t believe the DoD even considered to pick them for JEDI

Azure has its issues, but this kind of extreme take is hardly useful. Any large-scale cloud provider has problems—AWS had its fair share of major outages too.

this is not about outages

Azure ≈ Klout



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: