400 reasons to not use Microsoft Azure

amarant · 2025-03-01T01:21:48 1740792108

Story time!

A couple of years back, I was working at Mojang (makers of Minecraft).

We got purchased by Microsoft, which of course meant we had to at least try to migrate away from AWS to Azure. On the surface, it made sense: our AWS bill was pretty steep, iirc into the 6 figures monthly, we could have Azure for free*.

Fast forward about a year, and an uncountable amount of hours spent by both my team, and Azure solutions specialists, kindly lent to us by the Azure org itself, we all agreed the six figure bill to one of corporate daddy's largest competitors would have to stay!

I've written off Azure as a viable cloud provider since then. I've always thought I would have to revaluate that stance sooner or later. Wouldn't be the first time I was wrong!

tombert · 2025-03-01T03:15:22 1740798922

When I worked at Jet, a shopping website trying to compete with Amazon, we obviously did not want to give money to Amazon, so we used Azure.

For the most part it was just fine, until we started using CosmosDB (then called DocumentDB).

DocumentDB, in its first incarnation, was utterly terrible. The pricing was extremely hard to predict, so we would end up with ridiculous bills at the end of the week, the provided .NET SDK for it was buggy and horrible, but the very worst part was the WebUI appeared to be directly tied to your particular instance of CosmosDB.

Why is this bad? Because if you under-provisioned stuff for your database, it might start going slow, and it would actually lag the web interface you would use to increase the resources! We got into situations where we had to turn off the entire application, just to bump up the resources for Cosmos. It felt like it was a complete amateur hour from Microsoft.

My understanding is that Cosmos has gotten a lot better, but man that left a sour taste in my mouth. If I end up getting some free credits or something, maybe I'll give Azure another go but I would definitely not recommend it right now.

noen · 2025-03-01T10:09:58 1740823798

A team in my org worked with Jet for 2+ years to help y’all scale.

It was interesting seeing the biweekly status updates, they basically all started with “This is how Jet.com broke Azure core services this week”.

As much as it sucks, this was a deliberate strategy all the way from Satya - every employee knew Azure was a joke, but the only want to actually fix shit was to get internet scale customers to break it daily and weekly.

naasking · 2025-03-01T13:45:37 1740836737

> but the only want to actually fix shit was to get internet scale customers to break it daily and weekly.

I don't get it. There's lots of distributed systems theory that could provide a more robust, analytical approach to a scalable architecture. If a system is regularly breaking like this, it sounds like it should be a "back to the drawing board" moment.

cinntaile · 2025-03-01T13:53:02 1740837182

Back to the drawing board risks delaying your products with several years. Their strategy was probably the right one.

naasking · 2025-03-01T14:10:19 1740838219

It sounds like their product has been considered crap for years, so they've had the time.

GranPC · 2025-03-01T13:17:02 1740835022

> My understanding is that Cosmos has gotten a lot better

For some values of "better", I guess. Performance is still terrible, their data visualization/inspection tools are shameful, their SQL dialect is finicky and has no error reporting beyond "something is wrong with the input value", and their official Python SDK has a race condition bug that can silently clear out your documents when under heavy load.

I used to work at a Cosmos-heavy house and I would utter "fucking Cosmos" around 15 times a day.

motorest · 2025-03-01T08:33:03 1740817983

> My understanding is that Cosmos has gotten a lot better, but man that left a sour taste in my mouth.

A couple of years ago I stumbled upon a Azure project which started off using the old timey Cosmos DB. Looking at the repository history from those days, I saw a bunch of Entity Framework configurations and navigations and arcane wizardry that would take an engineer months to really figure out.

Then there was an update to CosmosSDK, and all that EF noise was replaced by single CRUD operations that took the unserialized object, id and partition key as input. That's it.

Worlds of difference, and a simple key-value query takes ~10ms to do.

Yes, it's worlds of difference.

mort96 · 2025-03-01T09:00:26 1740819626

> Worlds of difference, and a simple key-value query takes ~10ms to do.

Unless that query goes over the Internet to another continent, that's a really long time isn't it?

motorest · 2025-03-01T09:47:01 1740822421

> Unless that query goes over the Internet to another continent, that's a really long time isn't it?

If you're hosting a service in a cloud provider and you implement your services so that your call to the cloud provider's database goes over the internet to another continent, you have serious problems but none of them are caused by the cloud provider.

Also, CosmosDB is globally distributed.

pbalau · 2025-03-01T10:29:07 1740824947

They meant 10ms is slow.

motorest · 2025-03-01T11:04:56 1740827096

> They meant 10ms is slow.

I don't think anyone making that sort of claim knows anything about cloud services. A single roundtrip of a no-op request within the same data center takes 0.5ms. Add querying across multiple partitions and data seeking, and you don't find cloud providers doing better.

To frame how oblivious that claim is, DynamoDB is lauded for it's response times being sub-20ms.

https://stackoverflow.com/questions/34552625/how-to-get-sub-...

mort96 · 2025-03-01T11:13:19 1740827599

No that's exactly what I mean: I expect the round-trip within a datacenter to take roughly 0.5ms, I expect the key lookup to take about that time or less, so 10ms is roughly an order of magnitude more than I would expect a "simple key-value query" to take

jvans · 2025-03-01T13:05:02 1740834302

For comparison if you run your own hardware and do a memcached KV lookup with a different server on the same rack, p99 times are slightly under 1ms. Given the guarantees of cosmosdb ~10ms isn't that bad for a p100

maccard · 2025-03-01T09:40:49 1740822049

Yes it’s ages but not multi continent long. Us east 1 to us east 2 is about 10-15ms in my experience.

mort96 · 2025-03-01T09:56:15 1740822975

You're right, I exaggerated unnecessarily there. Sorry

maccard · 2025-03-01T12:43:19 1740832999

No reason to apologise - I assumed and tried to give a better reference.

It is an absolute eternity though. A KV lookup is fractions of a microsecond in a managed language like c#. A http request is in the microseconds range on localhost and a smidge more on a performant local network. A poorly behaved local network (busy wifi on an ISP router) is 1-2ms, and I can do a round trip to my nearest AWS region in 10-15ms from my home network.

It’s an absolute eternity, and when thinking about this stuff and scaling, think “is it worth slowing down the normal case by 1000x to introduce an external service”

neonsunset · 2025-03-01T14:41:57 1740840117

Some numbers no one asked for :)

ConcurrentDictionary<K, V> read latency is going to be around 7-15ns for the data hot in the cache, scaling with key length if it is string-based, and anywhere between 75ns and 150ns for reading that out of RAM. Alternate implementations like NonBlocking.ConcurrentDictionary can reach down to 3.5-5ns for integer-based keys given all data is in L1 and the branch predictor is fully primed, on modern x86_64 cores.

Twirrim · 2025-03-01T05:16:48 1740806208

> Because if you under-provisioned stuff for your database, it might start going slow, and it would actually lag the web interface you would use to increase the resources!

What the ever loving heck... seriously?! Why wouldn't this be a control plane API that reconfigures a data plane?!

tombert · 2025-03-01T06:05:14 1740809114

I think it was querying the database to get usage information, and it was blocking the loading of the page as a result. It's been a few years, but if I recall correctly, the .NET API for changing the database provisioning did work, so some people would hack up a quick script to change resources in an emergency.

I think they did fix it eventually, because I know that multiple people on my team complained to Microsoft about it. Very short-sighted decision on their end, but to be fair it was a brand new product.

vosper · 2025-03-01T06:11:46 1740809506

This is how hosted Elasticsearch works today, too. Really mess up your ES? Good luck with the web UI!

motorest · 2025-03-01T08:35:12 1740818112

> What the ever loving heck... seriously?!

What's so surprising about that? If your CRUD operations take longer to do, and you're doing those to drive a GUI, of course the GUI will lumber along.

BirdieNZ · 2025-03-01T09:58:54 1740823134

It's good practice to separate out your control plane and data plane, so in this kind of scenario you can use the control plane freely to manage and scale up the data plane without worrying about the data plane underresourcing affecting your control plane operations.

The reverse also applies; by separating them you can have issues with your control plane but not have the database go down.

Twirrim · 2025-03-01T14:37:41 1740839861

It shouldn't lag the web interface. There are common ways to handle this, such as making all resource mutations asynchronous, or if you want to keep it synchronous, ensuring that control processes get guaranteed CPU resources. It is not a particularly challenging thing to do.

UltraSane · 2025-03-01T10:19:31 1740824371

Control plane and data planes should be running on different hardware.

UltraSane · 2025-03-01T10:17:20 1740824240

You would assume that the UI process would be running on Very High priority.

wayne · 2025-03-01T02:01:33 1740794493

Your story reminds me of when Microsoft acquired Hotmail in the '90s and they tried migrating from FreeBSD & Solaris onto Windows NT/IIS. Having the world's largest email service running on the Windows stack would have been a huge endorsement. It took years until they were successful.

https://www.zdnet.com/article/ms-moving-hotmail-to-win2000-s...

https://jimbojones.livejournal.com/23143.html

natnatenathan · 2025-03-01T02:18:19 1740795499

Ha, I worked on that project. That drove a lot of good requirements into Windows that set us up for web based services (eventually)

p_ing · 2025-03-01T02:42:34 1740796954

Are you free to expand upon your role and perhaps some of the actual tech/fixes that made it back into Windows?

slt2021 · 2025-03-01T03:57:24 1740801444

Seriously, Windows 2000 was one of the most stable OS back in the day, rock solid. I used 2000 server as a desktop OS, instead of 98.

unlike shit show that was windows 95/98/ME

Arech · 2025-03-01T12:22:34 1740831754

While I don't disagree with that, in my experience all Windows instability on WinNT family (and I tightly worked with all end user versions of Win from 16 bit W3.11 to the recent Win11 with a very few exceptions) are caused by faulty hardware and/or bad drivers that can't handle it. I don't think I could remember any issue that I can't attribute to bad HW/3rd party driver.

Wrt Win95 & it's kind - all processes in that family essentially run in a single address space, and data "isolation" were "achieved" only through obscurity. If you knew some magic constants that were easily obtainable from disassembly, you could do anything there. So no wonder it was as bad as the worst program you've installed..

scarface_74 · 2025-03-01T14:38:54 1740839934

Almost all instability I’ve had with modern Windows or Macs has been caused by corporate installed malware - MDM software and virus software.

Arech · 2025-03-01T15:27:32 1740842852

Haha, yeah, that crap adds indeed

smackeyacky · 2025-03-01T04:53:40 1740804820

Windows 2000 server was peak windows. All the subsequent versions just got harder to maintain as they gradually ruined the user interface. Nobody cares about the UI on consumer windows but if you’re spending a lot of time in RDP the vista based server products are terrible.

I don’t hate windows 2019 but Linux is better, easier, faster and a relief after any futile attempts to use IIS or sql server in 2025.

genewitch · 2025-03-01T05:43:44 1740807824

windows xp x64 edition was pretty slick; and so was NT4. I agree that 2000 was pretty cool, but perhaps a lot of that is design nostalgia. It was very "serious business OS" where XP and Me looked like jellybeans and cartoons. My favorite windows, though, is win 7 ultimate, Steve Ballmer Edition. i was sad when i had to upgrade to winten.

ninja proof: https://i.imgur.com/l29rDVo.jpeg

smackeyacky · 2025-03-01T06:36:00 1740810960

I get the nostalgia for XP, it was the first windows consumer edition that didn’t suck, but for a server OS 2000 was so lean and easy to manage it makes me wonder how MS lost to Linux. Back then, it was a genuine competition, now you’d have to be crazy to choose windows to deploy anything.

p_ing · 2025-03-01T13:49:45 1740836985

Windows Server still has it's place. AD DS, file services, and SQL Server being the big ones. Linux doesn't have apps that do these things 'better'.

EvanAnderson · 2025-03-01T15:33:08 1740843188

I wish MSFT could build Active Directory and the associated constellation of services on Linux. You can make a reasonable simulacrum with Samba but it isn't as well-integrated.

(My fever dream wish is for a "distribution" of NT that boots in text mode and has an updated Interix subsystem alongside Win32. Throw in ZFS and it would be awesome.)

Hnrobert42 · 2025-03-01T10:18:11 1740824291

Maybe, but Win 3.1 was good for me.

genewitch · 2025-03-01T08:21:10 1740817270

i never used windows XP, i went from 2000 pro to XP x64 edition, which came out 2 years after XP did.

_blk · 2025-03-01T07:51:11 1740815471

After SP2 the worst wrinkles are taken care of. Oh, and skip ever second OS release, of course.

I'm not as much against windows as I uses to be but I'm not budging off Ubuntu LTS even though they too try really hard to rock the boat.

aloha2436 · 2025-03-01T05:06:18 1740805578

> vista based server products are terrible.

The first generation of tabletised 8/Metro interfaces made me audibly groan every time I had to RDP into machines running 2012.

9dev · 2025-03-01T08:10:44 1740816644

The stuttering over RDP when the start menu animation tried to slide in the tiles was amazing.

smackeyacky · 2025-03-01T05:18:18 1740806298

Oh yes. I still have a client that has 2012 and it physically hurts to use

andy81 · 2025-03-01T12:24:32 1740831872

Powershell was 2006, so I suppose the real "peak windows server UX" was 2016 when PS was relatively mature and came out-of-the-box with the latest version.

EvanAnderson · 2025-03-01T15:35:18 1740843318

If MSFT had back ported servicing stack updates to 2016 it would still be usable. As it stands it bogs down unreasonably when applying updates and needs lengthy DISM /CleanupImage processes to be run periodically to reclaim disk space.

globular-toast · 2025-03-01T12:55:01 1740833701

I went from 98 to 2000 (rather than ME) and it was an amazing experience. It showed me what an operating system could be like. Of course, what I really wanted was Linux, but I didn't know better at the time.

cobbaut · 2025-03-01T07:23:30 1740813810

> Seriously, Windows 2000 was one of the most stable OS back in the day, rock solid. I used 2000 server as a desktop OS, instead of 98.

Really? Oh, compared to other Windows versions...

Because it never came close to the stability of OS/400, Netware 3, AIX, Solaris or even OS/2 v2.

ThinkBeat · 2025-03-01T11:30:38 1740828638

I will fully agree on OS/400, of the operating systems and platforms I have worked with, it is by far the most stable.

That is easier to achieve when your operating system only runs on your own proprietary hardware. (No mess of millions of drivers to write for one).

It worked well for years without any sysadmin touching it.

Well my mom was trained to be the "sys admin", which meant rotating backup tapes.

tmountain · 2025-03-01T08:25:31 1740817531

I dunno how to compare stable to stable but I ran Win2k for so long that I got bored with it (something like 5-7 years) and never experienced a single crash. This is coming from a Linux guy btw… so I’m no Microsoft fanboy, just saying, it was as stable as any other stable OS.

cobbaut · 2025-03-01T08:52:54 1740819174

Didn't mean to bash you, sorry.

I saw years of uptime on those systems whereas Win2000 iirc needed a reboot for every single update of the OS, and even for applications like IIS or Exchange.

Compared to NT4 it was probably very stable, since I remember telling most clients to just shut it down Friday evening and boot it Monday morning cause the pre-SP4 NT4 could not stay up more than three weeks.

Compare that to AS/400, where we pushed updates all over the country, without warning clients, to system running in hospitals, and there never was even the slightest problem. It sounds irresponsible to do that today, but those updates just worked, all the time and all applications continued to work.

throwaway7356 · 2025-03-01T09:15:43 1740820543

> I saw years of uptime on those systems

This just means security updates were never installed.

(Or you claim that all those operating systems never had kernel-level security issues which seems doubtful...)

cobbaut · 2025-03-01T10:16:14 1740824174

Since these systems were from the 90ies they indeed did not get security updates.

Most were only locally connected (for example OS/2 had a Token Ring in one building). The WAN connection (for AS/400) was trusted.

hulitu · 2025-03-01T10:40:47 1740825647

You are comparing supermarket apples (Windows) with localy grown plums (AS400). Even today, Windows is not able to update Office without closing it.

Thaxll · 2025-03-01T03:20:49 1740799249

Like IIS running some part of the code in the kernel? ( http.sys ) :x

p_ing · 2025-03-01T03:54:00 1740801240

It has its advantages… but wasn’t done until Svr 2003.

https://learn.microsoft.com/en-us/iis/get-started/introducti...

hulitu · 2025-03-01T10:42:05 1740825725

> It has its advantages

Yeah, the advantages (RCE) were copied by modern web browsers. /s

UltraSane · 2025-03-01T10:13:48 1740824028

So it is one of the most successful examples of dogfooding in history?

DaiPlusPlus · 2025-03-01T02:26:12 1740795972

> Windows that set us up for web based services (eventually)

...and then .NET and SQL Server started shipping for Linux.

tharkun__ · 2025-03-01T04:06:42 1740802002

SQL Server is really Sybase tho, which was always capable of running on UNIX.

Can't say much more, but I worked on a huge (internal) Sybase ASE on Linux based app (you've _all_ bought products administered on this app ;) ) way back (yes, pre-SSD, multi path fiber I/O to get things fast, failover etc.) and T-SQL is really nice, as is/was ASE and the replication server. Been about 20 years tho, so who knows.

pimeys · 2025-03-01T08:35:13 1740818113

I worked with SQL Server a bit, writing a Rust client for it back in the days. The manual is really good, explaining the protocol clearly. That made it really easy to write a client for it.

Can't say the same for Oracle...

p_ing · 2025-03-01T04:21:09 1740802869

SQL Server uses NT and Win32 APIs, so the SQL team built a platform independent layer. Meaning NT and Win32 is still used by SQL on Linux. It’s pretty cool tech.

https://www.microsoft.com/en-us/sql-server/blog/2016/12/16/s...

icedchai · 2025-03-01T04:16:40 1740802600

I used to work at a Sybase shop in the late 90's. It was way nicer to work with than Oracle!

chupasaurus · 2025-03-01T03:30:42 1740799842

There are 2 decades between those 2 points, .NET was -4 y.o. at the first one.

giuseppe_petri · 2025-03-01T03:32:17 1740799937

Pre Microsoft hotmail is one of the things I miss about the 'old' internet, logging in with Navigator 3.something in the library at uni.

andrewf · 2025-03-01T05:57:39 1740808659

Links in the original are dead but I think this is the Microsoft doc on “what could Windows do better” - https://web.itu.edu.tr/~dalyanda/mssecrets/hotmail.html

EvanAnderson · 2025-03-01T06:59:17 1740812357

Thanks for linking to this.

The tone and content of this document is shockingly candid and frank. I think it did a ton to make Windows Server a better product. I have a lot of respect for the people at MSFT who reviewed the company's own product in such a critical light.

osigurdson · 2025-03-01T04:28:45 1740803325

Businesses are theoretically all about money but end up being driven by pride half the time.

cubano · 2025-03-01T08:54:29 1740819269

Of course. Why would you expect anything but? Pride is actually a very good driver of change if you ask me because people often do their best work when they are proud of what they are building.

osigurdson · 2025-03-01T13:34:23 1740836063

Perhaps ego is a better term. Concretely, why migrate hotmail from unix to Windows except due to ego? The NPV has to be negative here.

throwaway2037 · 2025-03-01T05:06:37 1740805597

Makes sense to me. After all, businesses are run by humans, who have egos to satisfy.

guenthert · 2025-03-01T13:23:16 1740835396

Engineers might take pride in their work, but on this level in a big organization, I rather suspect turf wars as motivation.

motorest · 2025-03-01T08:24:25 1740817465

> It took years until they were successful.

The 90s were the dark ages of cloud computing. It was the age of system administrator, desktop apps, Usenet, and the start of the internet as a public service. At the time concepts such as infrastructure as code, cloud, and continuous deployment, were unheard of.

AWS, which today we take for granted, was launched on 2002, and back then it started as a way to monetize Amazon's existing shared IT platform.

Of course migrating anything back then was a world of pain, specially when it's servers running on different OSes. It's like the rewrite from hell, that can even cover the OS layer. Of course it takes years.

ivan_gammel · 2025-03-01T10:21:37 1740824497

> At the time concepts such as infrastructure as code, cloud, and continuous deployment, were unheard of.

There existed different names and solutions for things like cloud. I worked with Grid Engine in 2000 after Sun acquired Gridware, but that project started in 1993. By 2000 we were experimenting with running Star Office on the grid and serving UI to thin clients (kind of what Google Docs or Office 365 do now, but on completely different stack).

hulitu · 2025-03-01T10:34:30 1740825270

IIS was wide open in Win NT/2k days. It took Microsoft some good years to patch the holes.

stego-tech · 2025-03-01T01:34:38 1740792878

You've got me curious: what was the single biggest barrier to migration, if you're able to disclose it? I'm guessing it was something proprietary to AWS, like how they handle serverless or something that couldn't translate over directly, but I'm always eager to learn why a migration from X to Y didn't work.

amarant · 2025-03-01T06:22:34 1740810154

This is a couple of years ago, so I fully expect most of the issues we had back then to be fixed by now, but it was definitely Azure that was the problem.

We wanted to use their hosted kubernetes solution(I forget the name) and pods would just randomly lose connection to eachother, everything network related was just ridiculously unstable. Host machines would report their status as healthy, but then be unable to start any pods, making scaling out the cluster unreliable. I also remember a colleague I regarded as a bit of a wizard being very frustrated with cosmosdb, but I cannot for the life of me remember what the specific issue was.

Our solution was actually quite well written, if I do say so myself, we had designed it to be cloud agnostic, just on the off chance that something like this would happen (there may have been rumours this acquisition would happen ahead of time).

But Azure was just utterly unable to deliver on anything they promised, thus the write-off on my part.

Epa095 · 2025-03-01T08:23:54 1740817434

Ohh, AKS. I had the 'pleasure' of using it quite early on, 5/6 years ago. We kept killing it by actually using it with more than 50 pods. You know you see one of few serious users when you get through the 3 layers of support and gets to talk to the actual developers in the US.

But my impression is that it's better now. In general my experience with azure is that the base services, those who see millions of hours in use, are stable. Think VMs, storage, queue etc. But the higher up you go in the stack, the fewer hours of use do they see, and the lower quality it gets.

stego-tech · 2025-03-01T15:49:55 1740844195

I appreciate the context! Honestly, every time I start thinking that K8s might be the "universal cloud" orchestrator I've been hoping for/working towards, stories like this remind me just how...tenuous it can be relative to traditional VMs and standard containers-as-appliances. Still working to get my Admin cert for the spec, but it's definitely not something that sparks joy, if you catch my drift.

Thank you for the insights!

Too · 2025-03-01T08:16:23 1740816983

Pods loosing connection to each other is still very very common in azure.

To see the positive side of it, it’s a Chaos Monkey test for free. Everything you deploy must be hardened with reconnections, something you should be doing anyway.

What’s frustrating is that it happens rarely enough to give the illusion of being stable, making it easy for PM to postpone hardening work, yet often enough to put a noticeable dent in your uptime if you don’t address it. Perfect degree of gaslighting.

motorest · 2025-03-01T08:36:50 1740818210

> To see the positive side of it, it’s a Chaos Monkey test for free. Everything you deploy must be hardened with reconnections, something you should be doing anyway.

Keep in mind that in Azure it's a must-have, whereas everywhere else it's either a nice-to-have or a sign your system is broken.

deelowe · 2025-03-01T14:24:26 1740839066

Having worked on the infra side of azure, I'm not surprised. Network is centrally managed and that team was a nightmare to deal with. Their ticket queue was so bad they only worked on sev 1 and the occasional 0. Nothing else got touched without talking to a VP and even then it often didn't change things.

naasking · 2025-03-01T13:48:39 1740836919

At that point, aren't you better off actually making your own services regularly disconnect clients so you iron out all of the reconnection bugs?

mi_lk · 2025-03-01T02:42:22 1740796942

Curious about details too. The parent's conclusion is to write off Azure, but I wonder if it's actually AWS or the way they use AWS that makes it hard to migrate.

Or put it in another way, if Mojong were to start with Azure but couldn't manage to migrate to AWS, which provider is the parent going to write off?

noodletheworld · 2025-03-01T03:24:53 1740799493

My experience has certainly been that AWS is both a) more stable, and b) has more migration resources and guides than the reverse.

It was easier to go to AWS than to Azure; and I've done both in the past ~4 years. Migrating to AWS was just technical work. Migrating to azure was 'fight unexpected bugs and the fact it doesn't actually work in surprising situations'.

The only reason to go to azure was that Microsoft waved a big fat $$$ discount to use their services.

Migrating to AWS was a breeze.

nosefrog · 2025-03-01T03:32:33 1740799953

Yup, the hardest part about migrating to Azure was jumping between all the managed services that work everywhere else but are insanely buggy on Azure. We ended up with the most basic architecture you can imagine (other than AKS, which works great as long as you don't use any plugins) and we're still running into issues.

We have a very long list of Azure features and services that we've banned people from using.

Just got off a call with someone at Azure today who told us to setup our own NAT gateway instead of using Azure's because of an outage where we made too many requests and then got our NAT Gateway quota taken away for the next 2 hours.

cj · 2025-03-01T03:49:28 1740800968

To be fair AWS also has quotas on NAT Gateways.

Maximum 55k concurrent connections. After that, they make you deploy NAT gateways in other availability zones. And a max throughput of 10 Gbps.

I imagine AWS would also tell you to deploy your own gateway if you were running into the 55k concurrent connection limit of managed NAT.

AWS tends to be flexible with their quota enforcement in my experience, though.

nosefrog · 2025-03-01T04:04:50 1740801890

Yea, I don't have a problem with the quota, more that the "out of quota" throttling lasts 2+ hours even after the traffic spike dies down.

motorest · 2025-03-01T08:43:06 1740818586

> Yup, the hardest part about migrating to Azure was jumping between all the managed services that work everywhere else but are insanely buggy on Azure.

Care to point out a concrete example? I've worked with Azure a few years ago and I wouldn't describe it as buggy. At most, I accuse them of not "getting" cloud as well as AWS. For example, the whole concept of a function app is ass-backwards, vs just deploying a Lambda with a specific capacity. That is mainly a reflection of having more years working with AWS, though.

patmcc · 2025-03-01T07:48:39 1740815319

My experience with AWS is the documentation can be bad/wrong, it can be difficult to find stuff, but the actual services are very solid. It does what you tell it to do. Stuff works.

My experience with Azure is that it simply breaks in ridiculous and frankly unacceptable ways, all the time. It's like someone is unplugging network and power cables every couple hours just for fun.

motorest · 2025-03-01T08:39:19 1740818359

> My experience has certainly been that AWS is both a) more stable, and b) has more migration resources and guides than the reverse.

I'm not sure how well founded the stability argument is. I still remember the infamous series of AWS outages that took place a couple of years ago.

The fact that AWS invests heavily in vendor lock-in is a problem created by AWS, not their competitors.

noodletheworld · 2025-03-01T09:56:30 1740822990

… I was scrolling back through the “400 reasons not to use azure” in the OP and off the top of my head I’ve seen a dozen of them personally.

You decide if that’s more or less stable than AWS.

I’d say the evidence is pretty empirical, but hey, all I can say is my experience was utterly unambiguous.

You can argue a lot of things, but hundreds of azure fails in a big giant list is probably one of the tougher ones to go “no, this is fine compared to AWS!” about, imo.

smackeyacky · 2025-03-01T04:58:35 1740805115

The networking is so, so bad in azure. We ran into all kinds of craziness with simple things that kept kicking us in the nuts like port exhaustion between different subscriptions. I quit my last job partly because they were wedded to azure.

darknavi · 2025-03-01T02:21:21 1740795681

Current Mojang employee here, we moved fully onto Azure as of a few years ago AFAIK.

Some more game-oriented technologies of course have helped in the years since though.

Edit: AWS -> Azure :)

amarant · 2025-03-01T06:33:22 1740810802

Oh you guys finally managed!? Cool to hear! I guess Azure must've gotten better then, back when I was there, the conclusion was that Azure simply wasn't mature enough to host Minecraft yet.

Again, this is a while ago: I remember when I started we were just starting to replace the old yggdrasil servers with the new micronaut based system which I think is still in use today?

I still remember that application fondly as the best architectured piece of software I've ever worked on. I hope all is well!

inetknght · 2025-03-01T03:12:05 1740798725

> Current Mojang employee here

Can I have my Mojang account back?

I literally cannot log in to it after it was forcefully migrated to Microsoft. Microsoft doesn't recognize my computer as not-a-bot. Something to do with being Linux, I imagine.

Or, can I get a refund?

amarant · 2025-03-01T06:39:15 1740811155

Oof, this won't help you, but I recall sending a very angry email to me new corporate overlord in the Xbox org for blocking me from using their web services if my user agent said Linux.

This was during the "Microsoft <3 Linux" campaign, and I think I cited that and then told em Minecraft would not be able to move forward with Xbox account migration until they stopped such idiocy.

Since I was the dev tasked with migrating Mojang accounts to Xbox accounts, I felt I had at least SOME credibility to my claim that it was blocking me.

But honestly, modifying my user agent was easy, it just pissed me off.

They did fix that the same day tho, so I guess the believed me!

zaik · 2025-03-01T08:17:34 1740817054

Moved my Minecraft license to a new Microsoft account as required. Microsoft account was flagged and blocked when I checked a few days after. And that's how I got scammed out of my Minecraft license.

immibis · 2025-03-01T14:26:32 1740839192

It wouldn't have helped if you hadn't, since they've now deleted all non-Microsoft accounts.

Reminder to whomever is reading: if you bought the game during alpha, you have the right to all future Minecraft games and a premium account forever. Microsoft barely tried to uphold this by giving a free Bedrock license to alpha buyers for a limited time several years ago. I suppose you'd have to sue them now if they break it, and the judge will wonder why you bothered to bring a $20 dispute to court.

mjevans · 2025-03-01T04:45:14 1740804314

Their bot detection systems are _extremely_ overzealous and don't even _tell you_ they're denying you nor offer ANY recourse to correct the problems.

I think I finally managed to make it work in a new/clean Chromium browser profile.

immibis · 2025-03-01T14:27:36 1740839256

Every time I've made a Microsoft account it's let me create it, then two days later locked me out if I don't give up my phone number.

The first time it was a throwaway account only needed for one day so I abandoned it. The second time I had to give them my phone number. Very scummy.

mikepurvis · 2025-03-01T04:39:35 1740803975

I bought my kid Minecraft and spent two hours trying to get it set up to run on a Windows 10 machine.

In the end I gave up and he plays it on an old Ubuntu laptop instead.

stanac · 2025-03-01T11:22:02 1740828122

This is a story from some parallel universe right here. You bought a microsoft game and wasn't able to run it on windows, but it works on ubuntu?!? I almost spilled coffee reading this.

MortyWaves · 2025-03-01T13:50:27 1740837027

The problem was you probably ended up falling for the UWP version instead of the Java version. The Java version remains the community accepted “proper” version to this day.

nickster · 2025-03-01T05:27:28 1740806848

How did it take two hours to install on windows 10?

duck2 · 2025-03-01T07:19:33 1740813573

it probably didn't work right away and two hours was spent on troubleshooting

notpushkin · 2025-03-01T02:23:37 1740795817

> we moved fully onto AWS as of a few years ago

Did you mean Azure?

darknavi · 2025-03-01T02:25:28 1740795928

Yes, I do. Whoops!

kenjackson · 2025-03-01T01:37:53 1740793073

It has since migrated to Azure. I suspect there was a gap in the technology that was since closed, as AWS certainly had a head start in general.

abrookewood · 2025-03-01T07:18:48 1740813528

Yep, sounds like my experience. Years ago, we migrated of Rackspace to Azure, but the database latency was diabolical. In the end, we got better performance by pointing the Azure web servers to the old database that was still in Rackspace than we did trying to use the database that was supposedly in the same data centre.

I kicked up a stink and we migrated everything to AWS in under a week.

sakopov · 2025-03-01T01:58:32 1740794312

IMHO, you're gonna struggle if you move anywhere else from AWS. We're migrating to GCP and there are gaps all over the place.

wink · 2025-03-01T17:30:38 1740850238

That highly depends on what services your're using.

We migrated from AWS to GCP in 2016/2017 (mostly VMs and related stuff, CloudFront, etc - no lambdas) and it was pretty painless and everything worked smoothly until the end of that company.

borg16 · 2025-03-01T02:00:04 1740794404

is it because there are features that AWS provides to you that are not available in GCP, or just the fact that setting up exact replicas of processes is hard for migrations like these?

esprehn · 2025-03-01T02:02:19 1740794539

Out of curiosity, what are the biggest gaps you've hit in GCP?

notpushkin · 2025-03-01T02:22:21 1740795741

This is the reason I try to avoid proprietary bullshit services. Use EC2, Postgres, and S3, and you’ll be fine in any cloud or even on bare metal.

lucb1e · 2025-03-01T04:43:57 1740804237

That sounds to my self-hoster ears as an expensive way to do self hosting. Isn't at least half the point of AWS to use their SaaS thingies that all integrate with each other, I think people now call it "cloud native software"?

Not that most of our customers whose AWS environment we audit do much of this, at least not beyond some basics like creating a vlan ("virtual private cloud"), three layers of proxies to "load balance" a traffic volume that my old laptop could handle without moving the load average above 0.2, and some super complex authentication/IAM rules for a handful of roles and service accounts iirc

(The other half of the point is infinite scale, so that you can get infinite customers signing up and using the system at once (which hopefully pay before your infinite AWS bill is due), but you can still do that with VPSes and managed databases/storage.)

tsimionescu · 2025-03-01T06:09:09 1740809349

The point of moving to AWS is often to benefit from their data centers and reliability promises. So VPC, EC2, IAM, maybe S3 have a clear point.

And one small note: apart from S3, virtually all AWS services are tied to a VPC, any kind of deployment starts with "ok, in what VPC do you want this resource?".

immibis · 2025-03-01T14:30:22 1740839422

You can get 95% of the reliability for 10% of the price at any dedicated hoster. AWS just figured out the magic word "cloud" means they can charge you 10 times the price.

At Azure or GCP you pay a similar price but you don't even get the reliability so literally why would you use them? The only reason I see is that "cloud" means you can start instances at any time without a setup fee or contract duration. But with the amount of cost difference, you could have three times your baseline "cloud" load running all the time at a non-cloud hoster, and still save money!

notpushkin · 2025-03-01T06:34:49 1740810889

It is an expensive way to do self hosting, yeah! I guess one reason is, sometimes it’s easier to just use one of big N clouds – e.g. if you’re in a regulated industry and your auditors will raise brows if they see a random VPS provider instead. (Or maybe not? If you’re doing that kind of audits I’d love to hear your thoughts!)

> Isn't at least half the point of AWS to use their SaaS thingies

It is. (That’s how they lock you in!) I think it’s okay to use some AWS stuff once in a while, but I’d be wary of building your whole app architecture around AWS.

I’m in the self-hosters camp myself :-) I’m building a Docker dashboard to make it easier to build, ship, and monitor applications on any server: https://lunni.dev/

amluto · 2025-03-01T07:31:29 1740814289

The is sort of petty, but… your web page does the horrible scroll-and-watch-the-content-fade-in thing. This is annoying and makes me want to stop reading. It also makes me wonder if your product does the same thing, which would be a complete nonstarter. Seriously, if I’m debugging some container issue, I do not want some completely unnecessary animation to prevent me from seeing the dashboard.

notpushkin · 2025-03-01T09:45:53 1740822353

Thanks for the feedback! No, the product’s UI is definitely on the pragmatic side: I think the only blocking animation we have right now is dialog windows sliding in (and we try to avoid these altogether!) Both the landing page and the app disable animations when prefers-reduced-motion is enabled.

I’ll rethink the landing page animations a bit later! (I was thinking about redoing it from scratch again, anyway :^)

9dev · 2025-03-01T08:27:40 1740817660

I’ve just dropped you a note in the chat. We’re also on Swarm, and dealing with most of the stuff you address, and some more. Would love to contribute to Lunni, if you’re open to that :)

amluto · 2025-03-01T07:39:36 1740814776

I’m personally thoroughly unimpressed by the bare metal S3 implementations I’ve tried.

And there’s also an issue with client libraries and their compatibility with various implementations. I recently discovered this issue:

https://github.com/boto/botocore/issues/3394

This is GCS implementing the S3 API incorrectly in a way that really ought not to break clients, but it’s still odd because the particular bug on GCS’s end seems like it took active effort to get wrong. But it’s also boto (the main library used in Python to access S3 and compatible services) doing something silly, tripping over GCS’s bug, and failing. And it’s AWS, who owns boto, freely admitting that they don’t intend to fix it, because boto isn’t actually intended to work with services that are merely compatible with S3.

As icing on the cake, to report this bug to Google, I apparently need to pay $29 plus a 3% surcharge on my GCS usage. Thanks.

Time to check out OpenDAL, I suppose.

ajross · 2025-03-01T11:32:21 1740828741

> As icing on the cake, to report this bug to Google, I apparently need to pay $29 plus a 3% surcharge on my GCS usage. Thanks.

That's the price of a support contract, not a "bug report". And it's not "plus", it's "or": support costs $29/month or 3% of your monthly billing, whichever is greater. It comes with SLA agreements for fixing or working around your reported problems. Though obviously in this case they'll probably just tell you to use their own python library and not boto.

amluto · 2025-03-01T15:38:49 1740843529

Oh, thanks Google, if my cloud spend is more than $967/mo, then I don’t get dinged by the $29 minimum. But this is the price of a bug report, because I can’t file a bug report without paying it.

And this situation is bad business. Google advertises that GCS has S3 interoperability support. And they have customers who use it in its interoperable mode. Presumably those customers could use GCS’s biggest competitor, too. Shouldn’t Google try to make the S3 interop work correctly?

ajross · 2025-03-01T16:02:34 1740844954

GCS and AWS are commercial products for which you pay, not open source projects you can expect to support you for free. I don't know what to say here, if you have something "serious" to do on these platforms then a 3% overhead for support seems like an obvious choice.

Honestly it seems to me like you're excited to have found a bug and want to report it for glory; we've all been there. But No One Cares about that stuff in the world of commercial software. They fix bugs for real customers, not internet rock stars. If you aren't losing even $29 (one mid-tier meal!) from this bug, well... does it even rise to the level of "yell about it on HN?".

mastazi · 2025-03-01T06:38:23 1740811103

> [...] and S3, and you’ll be fine in any cloud

Except Azure - AFAIK it's pretty much the only cloud provider that doesn't support S3 API, see e.g. https://learn.microsoft.com/en-us/answers/questions/1183760/...

perching_aix · 2025-03-01T11:06:10 1740827170

It is also the reason to use proprietary bullshit services. If there was not utility gap, then a reasonable evaluation would return that a migration to them is not worthwhile.

Now of course, that's a very sizable if...

darknavi · 2025-03-01T02:32:22 1740796342

> proprietary bullshit services

> S3

Hasn't this recently been an issue where Amazon arbitrarily changes the S3 contract and all software following it as a spec has to play catch-up?

notpushkin · 2025-03-01T02:49:57 1740797397

I think this happened a few times, but given there are many S3 clients, they can’t keep doing thiat or there will be a backlash.

Although perhaps it’s about time we made a proper standard based on S3, yeah.

immibis · 2025-03-01T14:32:19 1740839539

It has, but that's not exactly on S3 is it?

The S3 system is proprietary to Amazon, and it's your fault if you're not using Amazon but you're relying on Amazon to not change it anyway, because they have no obligation to you.

The concept of object storage is not proprietary. You should be able to change your code to use a different object storage provider.

motorest · 2025-03-01T08:17:18 1740817038

> Fast forward about a year, and an uncountable amount of hours spent by both my team, and Azure solutions specialists, kindly lent to us by the Azure org itself, we all agreed the six figure bill to one of corporate daddy's largest competitors would have to stay!

Being tied to AWS and being unable to shake off a huge bill is not a trait of its competitors. It's a trait of AWS, and stresses the importance of not picking them as your cloud provider.

Also, I think it's unbelievable that a monthly 6-figure invoice charged to a company already with cloud engineers in their payroll is not justification enough to shift their workload elsewhere.

kubb · 2025-03-01T08:26:18 1740817578

Low 6 figures is just a dev. If a team of 5 devs has to work on the problem for 5 years, then it will pay for itself in 25 years. Likely beyond any planning horizon of a company with yearly performance evaluation cycles.

motorest · 2025-03-01T08:47:58 1740818878

> Low 6 figures is just a dev.

...in the US.

In Europe, even the likes of Amazon pays it's SDEs 70k/year. In Sweden, for example, Microsoft pays it's SDEs south of 800k SEK, which is about 70k dollars/year.

Low 6 figures is an entire team of Microsoft SDEs working full time for a year.

pjmlp · 2025-03-01T08:18:27 1740817107

Interesting, because until now I have the same opinion in reverse.

Each case is a special case how everything gets configured, but between Azure, GCP, AWS and IBM clouds, the ones with smoother experience on my case, have been on Azure, based on Java and .NET technologies.

And we also have our share of support tickets across all of them.

Now Azure back in its early days, 2010 - 2016 was kind of ruff, maybe this is the timeframe you're referring to?

pragmatic · 2025-03-01T16:43:04 1740847384

Do you think it would be any different/better if you had to migrate to GCP (for example)?

Do you think it was the migration itself or the services on Azure?

Having worked with all three, there's certainly things that suck about all of them but I've found aws "most reliable" but also seems to have a large amount of disparate services needed to do things that were simpler on Azure.

GCP was pretty meh, but depends on what services you used.

Azure is a good choice for .net and sql server (azure sql or whatever it is now) but in but sure a service built for aws is going to "just work" on Azure (or vice versa).

dartos · 2025-03-01T13:56:41 1740837401

A great story of MS incompetence and Amazon’s vendor lock in.

UltraSane · 2025-03-01T10:12:57 1740823977

after using AWS and Azure extensively AWS seems to be quite well engineered by some very smart people. The isolation between regions is extremely good and the Availability Zone model is quite effective in making very reliable systems if you are willing to pay the cost of inter-AZ data transfer. My company has an Active Directory controller in 3 different AZs

Azure is a mess designed by smart people with no time and little budget. Azure flat out lies about the AZs they have by claiming two halved of one data center is two AZs

867-5309 · 2025-03-01T11:17:19 1740827839

>Microsoft acquired Hotmail in the '90s

it was around 2007 (I'm not that old!)

happymellon · 2025-03-01T11:20:35 1740828035

https://en.m.wikipedia.org/wiki/Hotmail

> Founded in 1996 by Sabeer Bhatia and Jack Smith as Hotmail, it was acquired by Microsoft in 1997 for an estimated $400 million

Wrong decade, they really did acquire Hotmail in the 90's.

bigbuppo · 2025-03-01T07:19:38 1740813578

That's okay, Microsoft will rename it to something else and completely change the admin UI and APIs next week. It will now be called Dynamics CoPilot OneAI 365 for Business OneCloud.

tgtweak · 2025-03-01T12:52:30 1740833550

+fabric

But the documentation and every other reference to it will retain the old name.

Biganon · 2025-03-01T14:51:27 1740840687

But some URLs will still be on "live.com", others on "outlook.com", others on "sharepoint.com", others on "msbinbows.com", others on...

AmazingTurtle · 2025-03-01T12:00:46 1740830446

take my angry upvote

hliyan · 2025-03-01T03:48:18 1740800898

In my experience (worked for organizations that used everything from on-prem server racks, to Linode to AWS to Azure), complaints about cloud infrastructure are proportional to managed service usage. I rarely hear teams that largely rely on virtual machines (perhaps with a managed RDBMS) complain. They do have to maintain a little extra scripting, but that's a minor inconvenience compared to battling issues and idiosyncrasies of managed services.

ajmurmann · 2025-03-01T05:51:27 1740808287

I'm sure it's gotten better now but back in 2016 provisioning VMs in Azure took so long that we joked that every time you provision an instance a Microsoft engineer gets in a car to buy the server.

bjackman · 2025-03-01T14:31:25 1740839485

Reminds me of how my Swiss bank doesn't support transfers outside business hours. I have to imagine when I click "send" in the UBS app, some guy named Hans-Ueli receives a tape printout and goes into the basement to move some silver pieces from one drawer to another.

gunsle · 2025-03-01T14:39:20 1740839960

If everyone in this thread shitting on Azure is going off how it worked in 2016 the comments here make a lot more sense. I know Microsoft bad still lingers in online communities but I have to say I’m surprised hackernews is still this anti Microsoft. In my experience, both Azure and AWS have their issues, it’s not like AWS is some perfect offering but you’d think that based on the comments.

lancebeet · 2025-03-01T06:56:42 1740812202

I'm a little confused by this post. Obviously it's easier to maintain a plain VM than managed services. That's why people are paying a lot more money to the cloud providers for managed services, so they don't have to do it themselves. What you're saying is that this is essentially a pointless endeavor? I don't think this statement is entirely uncontroversial, since managed services are the main reason for many companies to migrate to cloud.

conradev · 2025-03-01T07:11:15 1740813075

Using managed services is not a pointless endeavor – they can save you a lot of time (and therefore money).

Unless you need to switch providers, at which point it may take more time to adjust for differences in how those managed services operate.

Managed services are absolutely not the main reason for moving to the cloud. Companies do it for the flexibility that comes with renting the real estate/energy/hardware instead of owning it.

scarab92 · 2025-03-01T08:28:17 1740817697

We use managed services, but only those that are managed versions of pre-existing software.

For example, we'll used Managed Postgres, but not Azure or AWS's home grown databases.

Makes migrating much easier.

conradev · 2025-03-01T17:57:43 1740851863

Yes! The longer response is that the closer you stick to standards the easier of a time you will have. VMs are a standard with cloud-init and image formats, etc.

i.e. in 2025 managed Kubernetes is not _that_ different between providers

withinboredom · 2025-03-01T09:49:31 1740822571

Heh, until you need to rollback a specific table in postgres using their backup solution. IIRC, this is possible in AWS -- or at least, I'm 99% sure you can at least download the backup. In Azure? All you can do is restore the entire database, and you cannot download it.

seper8 · 2025-03-01T12:50:02 1740833402

You can def download it

m11a · 2025-03-01T09:38:51 1740821931

I mean, if it was solely about renting machines, we’d all just use DigitalOcean, or EC2 on AWS.

People use things like RDS and EKS/GKE to avoid all the administrative overhead that comes with running these things in prod. The database or its underlying hardware has a problem at 1am? It’s Amazon engineers getting paged, not you (hopefully… assuming the fault hasnt materialised to operational impact yet)

marcosdumay · 2025-03-01T15:34:37 1740843277

I've never seen a managed IaaS that saved time. It is marketed as something that can free you from hiring ops people, but you will absolutely need to hire some supplier relations people to deal with it. (And contract optimizers, and internal PR to deal with the fallout.)

It's different for fully featured SaaS. It's a matter of the abstracted complexity vs. interface complexity ratio that is so common for everything you do in software.

n4r9 · 2025-03-01T07:25:46 1740813946

It's easier for the provider to maintain a VM provision. It's supposed to be easier for the customer to maintain managed services, but that's often debatable.

Szpadel · 2025-03-01T12:06:06 1740830766

cloud services are great when they work. but in case something isn't you have no way to debug anything except maybe restarting the service if it's even possible.

we had one customer that needed IPsec tunel to vpc where production servers were living, we didn't want to maintain such setup just for single customer so we check Aws offerings. and look at that they have managed IPsec solution, great.

until client called that tunel is down and solution wa that they need to restart it on their end to resume connection. why? you can enable some logging to S3 but according to them everything should work. what we should do next?

but even if you stick to just ec2 thing can go weird. our recent incident: ec2 instance stopped responding but ASG didn't replace it, any action on it throwers error that instance is not running but it was in running state.

xahrepap · 2025-03-01T06:22:29 1740810149

I wish I could better help my org see that. Luckily my boss agrees with me, but he's not in full control. Between the vendor lock-in, and the _almost but not quite api compatibility_ with OSS... I just dread as more teams adopt it.

"But it's easier!" ... yeah, we'll see...

scarab92 · 2025-03-01T08:23:32 1740817412

Azure's anti competitive conduct is also the reason that AWS stopped lowering prices.

Before 2014 or so, AWS would periodically reduce prices on major services passing on falling technology costs.

Azure didn't like that, so they aligned their prices to AWS's, matching immediately the same discounts on the same service.

This is a form a predatory pricing, because the goal is to kill the incentive for competitors to reduce prices, by denying them market share gains when they do.

immibis · 2025-03-01T14:34:25 1740839665

"Show us a better price and we'll match it" is not a new tactic nor exclusive to clouds.

misiek08 · 2025-03-01T09:46:43 1740822403

We bought a company hosting on Azure. They used hosted Postgres and are hosting .NET services on Windows. Small infra, in range of 2-3 hundreds cores and 1T memory. Every few days M$ randomly shuts down random instances for maintenance, disconnects network for >10 minutes.

Migrated off hosted Postgres because performance was tragedy - now their India-based expert led us to use different volumes type and after instance restart database didn’t start up because of I/O latency. Expert don’t want to meet for 3 straight days now, because he is busy. RCA (half pager, written probably by some LLM) says it’s not their fault, but charts says different story.

The only thing they crash GCP and AWS with is dashboard that loads everything so quickly... sad you can’t run e.g. making 2 similar network operations in parallel because they will fail or take 10x the time they would take when run one after another.

Run, don’t use.

jordanbeiber · 2025-03-01T07:27:23 1740814043

Of all the paas providers Azure have the worst abstractions and services.

In general I think it’s sad that most buy in to consuming these ”weird” services and that there’s jobs to be had as cloud architects and specialists. It feeds bad design and loose threads as partners have to be kept relevant.

This is my take on the whole enterprise IT field though!

At my little shop of 30 so developers, we inherited an Azure mess, built abstractions for the services we need in a more ”industry standard” way in our dev tooling, and moved to Hetzner after a couple of years.

A developer here knows no different, basically - our tooling deals with our workflows and service abstractions, and these shouldn’t change just because new provider.

1/10-th of the monthly bill, and money partly spent on building the best DX one can imagine.

Great trade-off, IMO!

Only two cases come to mind for using big cloud:

- really small scale: mvp style

- massive global distribution with elasticity requirements.

Two outliers looking at the vast majority of companies out there.

Daegalus · 2025-03-01T12:56:59 1740833819

Oh boy, do I hate Azure with a passion.

I am part of a team building an automation tool for cloud provider creation of Projects/Accounts/Subscriptions (depending on provider). Our primary provider is GCP, and implementing that was fairly easy. Some gotchas, but easily surmountable.

Now we have gone multicloud to Azure and we need to add support (We were historically on AWS but moved 95% off, we still have some teams on there but we rarely build tools for it outside of Terraform modules). And the Azure API, MS Graph API and Go SDKs for bith are the biggest piles of trash I have ever worked with. Everything is a pointer, even string literals need to be made pointers, but sometimes they aren't....

Documentation is in accurate. Some apis take just the ID, others take a full path. Some of it is documented, many apis have the wrong one documented.

None of the APIs return related resources IDs, you have to search for all of it. So many name based searches. I had to add a caching layer for IDs during creation so I didn't have to lookup the same resource over and over (we use a state machine for creation and it can be resumed midway and other fun things, so we need a lot of checks and resume based code.

Overall it is the worst designed and implemented cloud provider. I would never recommend or choose it if given the power.

sublimefire · 2025-03-01T14:33:13 1740839593

SDKs are generated but for some reason Go implementation sucks. Internal stuff written in Go is great usually.

bob1029 · 2025-03-01T01:04:58 1740791098

I really wanted to like Azure because of how well it integrated with the rest of my tools, but I kept getting hit with VM availability limitations and UX quirks. I've never had issues getting machines in AWS, or feeling like my actions were taking effect.

I've also waffled several times on the Azure FaaS offering. I am now firmly and irrevocably at "Don't use it. Run away. Quickly.". The experience around Azure Functions is just too weird to get comfortable with. Getting at logs or other binary artifacts is a gigantic pain in the ass. Running a self-contained .NET build on a blank windows/linux VM is so easy it doesn't make sense to get in bed with all this extra complexity.

woleium · 2025-03-01T01:09:53 1740791393

Ugh, yes. Lack if availability of resources in whichever region i happen to need them.

Also, things that break automation, like calling back to say your sql server is up and running when in fact it’s not ready for another 20 minutes. I am half sure the terraform time_sleep was written specifically to counter azure problems.

briHass · 2025-03-01T03:12:07 1740798727

You missed the perfect middle ground between serverless and mouse-configuring an IIS VM: Azure App Services. It's the same service function apps are using once they advance beyond the trivial function and require longer runtimes or no spinup delay.

App Services takes some getting used to, but it's a locked down Win Server/IIS container with built in FTPS, self-healing healthcheck endpoints, deployment by pointing to a repository, auto-scaling options, and a 99.95 SLA.

A few years back, it was a bit of a dog performance-wise, but the modern CPUs have been no problem for a 2+ vCPU, Premium level SKU. Pricier than a VM, but dealing with security and updates for a webserver VM is a ton of work.

p_ing · 2025-03-01T03:58:11 1740801491

App Services can also run as a Linux container, should you not want Windows.

But the Windows containers have more features. I stuck with them for quite a number of websites.

Significantly cheaper than a VM as you noted just based on maintenance that would otherwise be required.

mafalda · 2025-03-01T01:56:01 1740794161

I believe any Azure user might be able to compile their 100 reasons to not use Azure, and the same will be true for most big pieces of software.

Even as someone that had minimal exposure to other clouds, I could easily see how Azure user experience lags due to the lack of proper care.

The amount of pages with a filter bar that will not work properly until you remember to click the load more should clearly be zero at this point, this is an objectively bad pattern that existed for years and should be "easy" to fix. But the issue will probably never be prioritized.

The fact is that unless tackling those issues are part of the organization core values or that they are clearly hitting their revenue stream, they won't be fixed. Publicity and visibility of those issues will always be crucial for the community of users.

cjcampbell · 2025-03-01T16:31:39 1740846699

I have significantly more experience in AWS, but I've spent equal time building and securing infrastructure in Azure for at least two years now. While AWS is not without it's rough edges, I'd pick it any day.

My number one concern with Azure is availability of resources. Working within US regions, we've had to shift regions during production rollout because one or more of the resources we needed -- a current gen Azure SQL database or App Service Plan -- were simply not available. Rolling out an inexpensive VM (think equivalent of a t3/t4g.micro) is always a ride too, between unavailable SKUs or excessive quota gatekeeping.

Spending gotchas exist on any cloud, but we also know someone who got caught off guard in a completely new way recently. In late-December, the team needed to automate a database event once per day on an Azure SQL instance. Scheduled jobs aren't natively available inside Azure SQL, and so they reached for an elastic job agent. Everything went smoothly until someone dug in to a price increase on the January bill and asked why Sentinel had jumped from under $200 to over $3,000.

A colleague and I helped them dig in and quickly discovered that the controller for the elastic job agent is running dozens of batches per second in order to schedule that one job per day. With default security audit settings on Sentinel to meet compliance obligations, this generates over 600GB of BATCH_COMPLETE log messages per month at a cost of $5/GB for ingest!

ripped_britches · 2025-03-01T01:18:45 1740791925

Vastly underrated cloud if you’re a small company and don’t operate containers is Cloudflare. I know they get criticized for other reasons but their DX is actually really great if you’re tired of the big 3 (4?)

maeil · 2025-03-01T02:19:55 1740795595

IME even a small company runs into something you need an actual server for, and then you're suddenly spread across two clouds because Cloudflare is serverless or nothing.

pm90 · 2025-03-01T07:43:37 1740815017

yeah i don’t understand why clouflare isn’t competing head on with cloud providers. Imo they should acquire fly.io, give them a blank check and 2 years and I think they can take down aws.

The reason aws is dominant is because it’s the default and a known quantity. But developers and cost conscious organizations will look at alternatives. Not saying it would be easy but the prize would be huge. Plus AWS seems in chaos with a lack of sensible leadership.

maeil · 2025-03-01T18:40:37 1740854437

Same, I see it as the single missing piece in the puzzle. If they had a VM product then for small businesses it'd be such an obvious choice, especially if they provide anywhere even just half the nice integration with their other products as they do with workers.

I'm guessing it's just because they're so extremely all-in on every single of their products being on the edge, and you can't make that work with VMs without becoming far more expensive than any of their customers would pay.

Or maybe I'm clueless. But it sure looks that way from the outside.

slig · 2025-03-01T01:21:33 1740792093

Can't wait to be able to run containers there. [1]

[1]: https://blog.cloudflare.com/container-platform-preview/

bithavoc · 2025-03-01T15:17:32 1740842252

I can’t believe how unlucky Fly.io is. This is going to be the second time Cloudflare steals their lunch.

madjam002 · 2025-03-01T11:53:45 1740830025

I moved off a fleet of VPSes onto Cloudflare Pages and subsequently back to VPSes again due to unpredictable latency, several cases of downtime in 12 months and weird bugs around static assets disappearing long before the advertised retention date for old deployments.

klabb3 · 2025-03-01T02:03:28 1740794608

It’s a JavaScript/node-like environment only right now no? I love it for my svelte site but it’s a massive limitation to be locked in based on language and request-response based constraints right now.

You need at the very least containers and persistent volumes to be interesting to me at least.

hamandcheese · 2025-03-01T02:22:19 1740795739

JS, TS, Python, and Rust are first class options, plus generic wasm:

https://developers.cloudflare.com/workers/languages/

pimeys · 2025-03-01T09:27:25 1740821245

We ran our whole platform, written in Rust, in the Cloudflare Workers. It was not a great experience. You need to use their SDK, with really interesting bugs that never got fixed (we always forked a version). It was pretty hard to test anything locally, you just had to deploy your code to their platform, which took time and made the feedback loop to take so much time it blocked us from delivering features fast enough.

And yes, you can test your local Rust code. It works nicely on your machine, but breaks with a really nasty error on their platform.

The target is `wasm32-unknown-unknown`, which allows you to use `fetch` as your only source of IO. Ok, their workers has a hacky socket implementation nowadays. Non-standard of course. And most of the ecosystem won't work really without forking everything and fixing the bugs by yourself.

We pivoted to a native Rust project. We still have one worker running in Cloudflare. We isolated that code from the workspace so that renovate updates will not touch it. You know, a random version upgrade might break the service...

mi_lk · 2025-03-01T03:45:25 1740800725

Interesting choice to support Rust over Go if you ask me. Don't have numbers but I don't really peg Rust as a popular language for serverless web apps, certainly not to the extent of Go

watermelon0 · 2025-03-01T07:18:37 1740813517

Non-JS workers run via WASM, and Rust had WASM support before Go.

re-thc · 2025-03-01T02:33:04 1740796384

> and Rust are first class options, plus generic wasm

Rust is still a wasm target. Not everything easily works. It doesn’t have all the Cloudflare sdk features js/ts has either.

kflansburg · 2025-03-01T03:32:41 1740799961

All of the JS features are available in Rust, but some don’t have a first-class SDK API yet and you must use wasm-bindgen.

pier25 · 2025-03-01T04:09:35 1740802175

I love Workers but they are not a panacea.

crims0n · 2025-03-01T00:59:23 1740790763

I work in both AWS and Azure and let me tell you, one thing I absolutely love about Azure is their portal. It’s like AWS 2.0 where all the cloud cruft is abstracted away and all that is left is the knobs you actually need to turn, and how they relate to one another.

I love me some AWS, but my god every time I have to dive into an unfamiliar environment and try and reverse engineer how everything connects - I need a drink afterwards.

VladVladikoff · 2025-03-01T04:53:16 1740804796

I’m having a really hard time believing you are serious. The one time I tried out azure for a few days the portal was absolutely painful. Every click would take 5-10 seconds for a response. Sometimes basic settings change actions would take 2+ minutes of watching an Ajax spinner. How can anyone enjoy working like that???

tsimionescu · 2025-03-01T06:17:23 1740809843

Sure, the UI is sluggish, but at least you don't have to move through three different "services" to find the routing table that your VM is using.

AWS UIs are generally snappy and smartly designed individually, but they are horrendously organized at the general level. AWS is built as if you are exploring a relational DB containing your resources, instead of a deployment tree.

Your VM doesn't have a NIC in AWS, it has a foreign key to your entire VPC's NIC table, which lives in the VPC service, not the EC2 service. And then your NIC doesn't have an associated subnet, it has a foreign key to the subnet. And then when you get to the subnet table, you look up the routing tables table, and finally in the routing tables table, you'll find the settings for the routing table. This all works through following links, but the constant context switching and tabs upon tabs that AWS UI requires are extremely unpleasant for me at least to use. I'll take Azure's sluggish UI that organizes all of this in one-two pages instead of four any day.

zamalek · 2025-03-01T07:40:31 1740814831

AWS pages are built by different teams, and it shows. We're all supposed to use IAC though, right?

In all seriousness, even in the face of IAC, the one thing Azure can do that AWS can't [at the time this happened to me], is have a global view of everything that's running right now and costing me money. It was years back, it was a $5 bill, but the principle of it had me livid. I did my best to tear down everything after my evaluation, yet something was squirreled away costing money.

So yeah, absolutely, sluggish UI all the way (I also find the Amazon storefront profoundly ugly and disorganized).

antonvs · 2025-03-01T11:18:31 1740827911

For a view of everything that’s costing you money, you can just look at your detailed billing data. That’s been available for at least 13 years.

Hikikomori · 2025-03-01T12:28:14 1740832094

You using terraform state to look at route table entries?

Hikikomori · 2025-03-01T12:26:47 1740832007

ENIs are under ec2 in the console, not VPC, on API/CLI they're all under ec2 together with all networking.

If you click an instance and go to its networking tab you get a list of ENI IDs that are clickable links to the resource, same for vpc and subnet. If you click subnet you can just click the route table tab, so if you're on an instances networking tab the route table is 2 clicks away.

But rather than doing this you could use reachability analyzer that allows you to check routing tables and security groups for a source and destination IP/resource and port on same or different VPCs connected with peering or TGW and it will tell you if you're missing routes or SG rules in either direction. I created a slackbot that allowed our devs to input src/dst IP/domain and port an that used this API to do the check for them, saved a lot of time troubleshooting.

I had an absolutely horrendous time working in Azure a few years ago (as a network engineer), we did have quite a complex setup with custom route tables and Azure Firewall though and VPN connectivity between Azure and AWS, but stuff like their VPN gateway taking 40+ minutes to change instance size on, wtf? I've filed 2-3 bugs to AWS in the almost 10 years I've worked with it, all for newly created APIs/services, they were all fixed within a week or two. I filed 8+ bugs to Azure in the first month using them, none of them were fixed as they had workarounds instead. And their documentation is absolutely useless, I could never trust that I understood what I read correctly, I always had to verify that it worked that way by testing it.

blencdr · 2025-03-01T07:34:22 1740814462

God, you never tried Oracle Cloud. It's not an UI, it's an escape room. I would pay to switch to Azure.

dustedcodes · 2025-03-01T15:00:05 1740841205

must be a paid troll, no self respecting intelligent engineer would find the Azure portal good. it’s horrible ux, really convoluted and complicated, very unintuitive, horizontal scroll is a joke when the web scrolls vertically, tiny fonts making everything hard to read and screens overloaded with so much shit and yet they managed to not put on the screen the main thing that developers would care about. it’s a complete joke

mexicocitinluez · 2025-03-01T13:41:01 1740836461

> I’m having a really hard time believing you are serious

and

> The one time I tried out azure for a few days the portal was absolutely painful.

conflict with each other. Here's what you sound like:

"I don't believe you because I have very little experience in something and it doesn't comport with that."

Epa095 · 2025-03-01T08:30:41 1740817841

And most links can't be opened in a new window, you ant right click on them and middle click don't work!!

7bit · 2025-03-01T11:53:59 1740830039

My god, I've been sending feedback about this shit for 3 or so years. You can't open any-fucking-thing in a new tab. The funny thing is, it used to work and they fucked that up.

I don't understand how dumb you must be to design a web site that way. It's like a brewery that sells their beer in plastic shopping bags and thinks that's good.

solatic · 2025-03-01T06:50:56 1740811856

I think OP is trying to differentiate between Azure APIs, which are unbelievably slow and horrible, and the UI design itself - the layout, the font, how one screen will flow to another screen, what links to what, how it would be laid out in a tool like Figma.

Azure's APIs are atrociously slow. Azure's UI design is pretty nice. There's not much the UI designers can do about their API colleagues.

imperialdrive · 2025-03-01T03:34:33 1740800073

I know we're talking about AWS and Azure here, but had to add that fwiw, the M365 admin interface(s) are so bad it practically feels like a prank. In other words, it's as though someone is purposely making them as chaotic as possible to what end I can't even guess.