A couple of years back, I was working at Mojang (makers of Minecraft).
We got purchased by Microsoft, which of course meant we had to at least try to migrate away from AWS to Azure. On the surface, it made sense: our AWS bill was pretty steep, iirc into the 6 figures monthly, we could have Azure for free*.
Fast forward about a year, and an uncountable amount of hours spent by both my team, and Azure solutions specialists, kindly lent to us by the Azure org itself, we all agreed the six figure bill to one of corporate daddy's largest competitors would have to stay!
I've written off Azure as a viable cloud provider since then. I've always thought I would have to revaluate that stance sooner or later. Wouldn't be the first time I was wrong!
When I worked at Jet, a shopping website trying to compete with Amazon, we obviously did not want to give money to Amazon, so we used Azure.
For the most part it was just fine, until we started using CosmosDB (then called DocumentDB).
DocumentDB, in its first incarnation, was utterly terrible. The pricing was extremely hard to predict, so we would end up with ridiculous bills at the end of the week, the provided .NET SDK for it was buggy and horrible, but the very worst part was the WebUI appeared to be directly tied to your particular instance of CosmosDB.
Why is this bad? Because if you under-provisioned stuff for your database, it might start going slow, and it would actually lag the web interface you would use to increase the resources! We got into situations where we had to turn off the entire application, just to bump up the resources for Cosmos. It felt like it was a complete amateur hour from Microsoft.
My understanding is that Cosmos has gotten a lot better, but man that left a sour taste in my mouth. If I end up getting some free credits or something, maybe I'll give Azure another go but I would definitely not recommend it right now.
A team in my org worked with Jet for 2+ years to help y’all scale.
It was interesting seeing the biweekly status updates, they basically all started with “This is how Jet.com broke Azure core services this week”.
As much as it sucks, this was a deliberate strategy all the way from Satya - every employee knew Azure was a joke, but the only want to actually fix shit was to get internet scale customers to break it daily and weekly.
> but the only want to actually fix shit was to get internet scale customers to break it daily and weekly.
I don't get it. There's lots of distributed systems theory that could provide a more robust, analytical approach to a scalable architecture. If a system is regularly breaking like this, it sounds like it should be a "back to the drawing board" moment.
> My understanding is that Cosmos has gotten a lot better
For some values of "better", I guess. Performance is still terrible, their data visualization/inspection tools are shameful, their SQL dialect is finicky and has no error reporting beyond "something is wrong with the input value", and their official Python SDK has a race condition bug that can silently clear out your documents when under heavy load.
I used to work at a Cosmos-heavy house and I would utter "fucking Cosmos" around 15 times a day.
> My understanding is that Cosmos has gotten a lot better, but man that left a sour taste in my mouth.
A couple of years ago I stumbled upon a Azure project which started off using the old timey Cosmos DB. Looking at the repository history from those days, I saw a bunch of Entity Framework configurations and navigations and arcane wizardry that would take an engineer months to really figure out.
Then there was an update to CosmosSDK, and all that EF noise was replaced by single CRUD operations that took the unserialized object, id and partition key as input. That's it.
Worlds of difference, and a simple key-value query takes ~10ms to do.
> Unless that query goes over the Internet to another continent, that's a really long time isn't it?
If you're hosting a service in a cloud provider and you implement your services so that your call to the cloud provider's database goes over the internet to another continent, you have serious problems but none of them are caused by the cloud provider.
I don't think anyone making that sort of claim knows anything about cloud services. A single roundtrip of a no-op request within the same data center takes 0.5ms. Add querying across multiple partitions and data seeking, and you don't find cloud providers doing better.
To frame how oblivious that claim is, DynamoDB is lauded for it's response times being sub-20ms.
No that's exactly what I mean: I expect the round-trip within a datacenter to take roughly 0.5ms, I expect the key lookup to take about that time or less, so 10ms is roughly an order of magnitude more than I would expect a "simple key-value query" to take
For comparison if you run your own hardware and do a memcached KV lookup with a different server on the same rack, p99 times are slightly under 1ms. Given the guarantees of cosmosdb ~10ms isn't that bad for a p100
No reason to apologise - I assumed and tried to give a better reference.
It is an absolute eternity though. A KV lookup is fractions of a microsecond in a managed language like c#. A http request is in the microseconds range on localhost and a smidge more on a performant local network. A poorly behaved local network (busy wifi on an ISP router) is 1-2ms, and I can do a round trip to my nearest AWS region in 10-15ms from my home network.
It’s an absolute eternity, and when thinking about this stuff and scaling, think “is it worth slowing down the normal case by 1000x to introduce an external service”
ConcurrentDictionary<K, V> read latency is going to be around 7-15ns for the data hot in the cache, scaling with key length if it is string-based, and anywhere between 75ns and 150ns for reading that out of RAM. Alternate implementations like NonBlocking.ConcurrentDictionary can reach down to 3.5-5ns for integer-based keys given all data is in L1 and the branch predictor is fully primed, on modern x86_64 cores.
> Because if you under-provisioned stuff for your database, it might start going slow, and it would actually lag the web interface you would use to increase the resources!
What the ever loving heck... seriously?! Why wouldn't this be a control plane API that reconfigures a data plane?!
I think it was querying the database to get usage information, and it was blocking the loading of the page as a result. It's been a few years, but if I recall correctly, the .NET API for changing the database provisioning did work, so some people would hack up a quick script to change resources in an emergency.
I think they did fix it eventually, because I know that multiple people on my team complained to Microsoft about it. Very short-sighted decision on their end, but to be fair it was a brand new product.
What's so surprising about that? If your CRUD operations take longer to do, and you're doing those to drive a GUI, of course the GUI will lumber along.
It's good practice to separate out your control plane and data plane, so in this kind of scenario you can use the control plane freely to manage and scale up the data plane without worrying about the data plane underresourcing affecting your control plane operations.
The reverse also applies; by separating them you can have issues with your control plane but not have the database go down.
It shouldn't lag the web interface. There are common ways to handle this, such as making all resource mutations asynchronous, or if you want to keep it synchronous, ensuring that control processes get guaranteed CPU resources. It is not a particularly challenging thing to do.
Your story reminds me of when Microsoft acquired Hotmail in the '90s and they tried migrating from FreeBSD & Solaris onto Windows NT/IIS. Having the world's largest email service running on the Windows stack would have been a huge endorsement. It took years until they were successful.
While I don't disagree with that, in my experience all Windows instability on WinNT family (and I tightly worked with all end user versions of Win from 16 bit W3.11 to the recent Win11 with a very few exceptions) are caused by faulty hardware and/or bad drivers that can't handle it.
I don't think I could remember any issue that I can't attribute to bad HW/3rd party driver.
Wrt Win95 & it's kind - all processes in that family essentially run in a single address space, and data "isolation" were "achieved" only through obscurity. If you knew some magic constants that were easily obtainable from disassembly, you could do anything there. So no wonder it was as bad as the worst program you've installed..
Windows 2000 server was peak windows. All the subsequent versions just got harder to maintain as they gradually ruined the user interface. Nobody cares about the UI on consumer windows but if you’re spending a lot of time in RDP the vista based server products are terrible.
I don’t hate windows 2019 but Linux is better, easier, faster and a relief after any futile attempts to use IIS or sql server in 2025.
windows xp x64 edition was pretty slick; and so was NT4. I agree that 2000 was pretty cool, but perhaps a lot of that is design nostalgia. It was very "serious business OS" where XP and Me looked like jellybeans and cartoons. My favorite windows, though, is win 7 ultimate, Steve Ballmer Edition. i was sad when i had to upgrade to winten.
I get the nostalgia for XP, it was the first windows consumer edition that didn’t suck, but for a server OS 2000 was so lean and easy to manage it makes me wonder how MS lost to Linux. Back then, it was a genuine competition, now you’d have to be crazy to choose windows to deploy anything.
I wish MSFT could build Active Directory and the associated constellation of services on Linux. You can make a reasonable simulacrum with Samba but it isn't as well-integrated.
(My fever dream wish is for a "distribution" of NT that boots in text mode and has an updated Interix subsystem alongside Win32. Throw in ZFS and it would be awesome.)
Powershell was 2006, so I suppose the real "peak windows server UX" was 2016 when PS was relatively mature and came out-of-the-box with the latest version.
If MSFT had back ported servicing stack updates to 2016 it would still be usable. As it stands it bogs down unreasonably when applying updates and needs lengthy DISM /CleanupImage processes to be run periodically to reclaim disk space.
I went from 98 to 2000 (rather than ME) and it was an amazing experience. It showed me what an operating system could be like. Of course, what I really wanted was Linux, but I didn't know better at the time.
I dunno how to compare stable to stable but I ran Win2k for so long that I got bored with it (something like 5-7 years) and never experienced a single crash. This is coming from a Linux guy btw… so I’m no Microsoft fanboy, just saying, it was as stable as any other stable OS.
I saw years of uptime on those systems whereas Win2000 iirc needed a reboot for every single update of the OS, and even for applications like IIS or Exchange.
Compared to NT4 it was probably very stable, since I remember telling most clients to just shut it down Friday evening and boot it Monday morning cause the pre-SP4 NT4 could not stay up more than three weeks.
Compare that to AS/400, where we pushed updates all over the country, without warning clients, to system running in hospitals, and there never was even the slightest problem. It sounds irresponsible to do that today, but those updates just worked, all the time and all applications continued to work.
SQL Server is really Sybase tho, which was always capable of running on UNIX.
Can't say much more, but I worked on a huge (internal) Sybase ASE on Linux based app (you've _all_ bought products administered on this app ;) ) way back (yes, pre-SSD, multi path fiber I/O to get things fast, failover etc.) and T-SQL is really nice, as is/was ASE and the replication server. Been about 20 years tho, so who knows.
I worked with SQL Server a bit, writing a Rust client for it back in the days. The manual is really good, explaining the protocol clearly. That made it really easy to write a client for it.
SQL Server uses NT and Win32 APIs, so the SQL team built a platform independent layer. Meaning NT and Win32 is still used by SQL on Linux. It’s pretty cool tech.
The tone and content of this document is shockingly candid and frank. I think it did a ton to make Windows Server a better product. I have a lot of respect for the people at MSFT who reviewed the company's own product in such a critical light.
Of course. Why would you expect anything but? Pride is actually a very good driver of change if you ask me because people often do their best work when they are proud of what they are building.
The 90s were the dark ages of cloud computing. It was the age of system administrator, desktop apps, Usenet, and the start of the internet as a public service. At the time concepts such as infrastructure as code, cloud, and continuous deployment, were unheard of.
AWS, which today we take for granted, was launched on 2002, and back then it started as a way to monetize Amazon's existing shared IT platform.
Of course migrating anything back then was a world of pain, specially when it's servers running on different OSes. It's like the rewrite from hell, that can even cover the OS layer. Of course it takes years.
> At the time concepts such as infrastructure as code, cloud, and continuous deployment, were unheard of.
There existed different names and solutions for things like cloud. I worked with Grid Engine in 2000 after Sun acquired Gridware, but that project started in 1993. By 2000 we were experimenting with running Star Office on the grid and serving UI to thin clients (kind of what Google Docs or Office 365 do now, but on completely different stack).
You've got me curious: what was the single biggest barrier to migration, if you're able to disclose it? I'm guessing it was something proprietary to AWS, like how they handle serverless or something that couldn't translate over directly, but I'm always eager to learn why a migration from X to Y didn't work.
This is a couple of years ago, so I fully expect most of the issues we had back then to be fixed by now, but it was definitely Azure that was the problem.
We wanted to use their hosted kubernetes solution(I forget the name) and pods would just randomly lose connection to eachother, everything network related was just ridiculously unstable. Host machines would report their status as healthy, but then be unable to start any pods, making scaling out the cluster unreliable. I also remember a colleague I regarded as a bit of a wizard being very frustrated with cosmosdb, but I cannot for the life of me remember what the specific issue was.
Our solution was actually quite well written, if I do say so myself, we had designed it to be cloud agnostic, just on the off chance that something like this would happen (there may have been rumours this acquisition would happen ahead of time).
But Azure was just utterly unable to deliver on anything they promised, thus the write-off on my part.
Ohh, AKS. I had the 'pleasure' of using it quite early on, 5/6 years ago. We kept killing it by actually using it with more than 50 pods. You know you see one of few serious users when you get through the 3 layers of support and gets to talk to the actual developers in the US.
But my impression is that it's better now. In general my experience with azure is that the base services, those who see millions of hours in use, are stable. Think VMs, storage, queue etc. But the higher up you go in the stack, the fewer hours of use do they see, and the lower quality it gets.
I appreciate the context! Honestly, every time I start thinking that K8s might be the "universal cloud" orchestrator I've been hoping for/working towards, stories like this remind me just how...tenuous it can be relative to traditional VMs and standard containers-as-appliances. Still working to get my Admin cert for the spec, but it's definitely not something that sparks joy, if you catch my drift.
Pods loosing connection to each other is still very very common in azure.
To see the positive side of it, it’s a Chaos Monkey test for free. Everything you deploy must be hardened with reconnections, something you should be doing anyway.
What’s frustrating is that it happens rarely enough to give the illusion of being stable, making it easy for PM to postpone hardening work, yet often enough to put a noticeable dent in your uptime if you don’t address it. Perfect degree of gaslighting.
> To see the positive side of it, it’s a Chaos Monkey test for free. Everything you deploy must be hardened with reconnections, something you should be doing anyway.
Keep in mind that in Azure it's a must-have, whereas everywhere else it's either a nice-to-have or a sign your system is broken.
Having worked on the infra side of azure, I'm not surprised. Network is centrally managed and that team was a nightmare to deal with. Their ticket queue was so bad they only worked on sev 1 and the occasional 0. Nothing else got touched without talking to a VP and even then it often didn't change things.
Curious about details too. The parent's conclusion is to write off Azure, but I wonder if it's actually AWS or the way they use AWS that makes it hard to migrate.
Or put it in another way, if Mojong were to start with Azure but couldn't manage to migrate to AWS, which provider is the parent going to write off?
My experience has certainly been that AWS is both a) more stable, and b) has more migration resources and guides than the reverse.
It was easier to go to AWS than to Azure; and I've done both in the past ~4 years. Migrating to AWS was just technical work. Migrating to azure was 'fight unexpected bugs and the fact it doesn't actually work in surprising situations'.
The only reason to go to azure was that Microsoft waved a big fat $$$ discount to use their services.
Yup, the hardest part about migrating to Azure was jumping between all the managed services that work everywhere else but are insanely buggy on Azure. We ended up with the most basic architecture you can imagine (other than AKS, which works great as long as you don't use any plugins) and we're still running into issues.
We have a very long list of Azure features and services that we've banned people from using.
Just got off a call with someone at Azure today who told us to setup our own NAT gateway instead of using Azure's because of an outage where we made too many requests and then got our NAT Gateway quota taken away for the next 2 hours.
> Yup, the hardest part about migrating to Azure was jumping between all the managed services that work everywhere else but are insanely buggy on Azure.
Care to point out a concrete example? I've worked with Azure a few years ago and I wouldn't describe it as buggy. At most, I accuse them of not "getting" cloud as well as AWS. For example, the whole concept of a function app is ass-backwards, vs just deploying a Lambda with a specific capacity. That is mainly a reflection of having more years working with AWS, though.
My experience with AWS is the documentation can be bad/wrong, it can be difficult to find stuff, but the actual services are very solid. It does what you tell it to do. Stuff works.
My experience with Azure is that it simply breaks in ridiculous and frankly unacceptable ways, all the time. It's like someone is unplugging network and power cables every couple hours just for fun.
… I was scrolling back through the “400 reasons not to use azure” in the OP and off the top of my head I’ve seen a dozen of them personally.
You decide if that’s more or less stable than AWS.
I’d say the evidence is pretty empirical, but hey, all I can say is my experience was utterly unambiguous.
You can argue a lot of things, but hundreds of azure fails in a big giant list is probably one of the tougher ones to go “no, this is fine compared to AWS!” about, imo.
The networking is so, so bad in azure. We ran into all kinds of craziness with simple things that kept kicking us in the nuts like port exhaustion between different subscriptions. I quit my last job partly because they were wedded to azure.
Oh you guys finally managed!? Cool to hear! I guess Azure must've gotten better then, back when I was there, the conclusion was that Azure simply wasn't mature enough to host Minecraft yet.
Again, this is a while ago: I remember when I started we were just starting to replace the old yggdrasil servers with the new micronaut based system which I think is still in use today?
I still remember that application fondly as the best architectured piece of software I've ever worked on. I hope all is well!
I literally cannot log in to it after it was forcefully migrated to Microsoft. Microsoft doesn't recognize my computer as not-a-bot. Something to do with being Linux, I imagine.
Oof, this won't help you, but I recall sending a very angry email to me new corporate overlord in the Xbox org for blocking me from using their web services if my user agent said Linux.
This was during the "Microsoft <3 Linux" campaign, and I think I cited that and then told em Minecraft would not be able to move forward with Xbox account migration until they stopped such idiocy.
Since I was the dev tasked with migrating Mojang accounts to Xbox accounts, I felt I had at least SOME credibility to my claim that it was blocking me.
But honestly, modifying my user agent was easy, it just pissed me off.
They did fix that the same day tho, so I guess the believed me!
Moved my Minecraft license to a new Microsoft account as required. Microsoft account was flagged and blocked when I checked a few days after. And that's how I got scammed out of my Minecraft license.
It wouldn't have helped if you hadn't, since they've now deleted all non-Microsoft accounts.
Reminder to whomever is reading: if you bought the game during alpha, you have the right to all future Minecraft games and a premium account forever. Microsoft barely tried to uphold this by giving a free Bedrock license to alpha buyers for a limited time several years ago. I suppose you'd have to sue them now if they break it, and the judge will wonder why you bothered to bring a $20 dispute to court.
This is a story from some parallel universe right here. You bought a microsoft game and wasn't able to run it on windows, but it works on ubuntu?!? I almost spilled coffee reading this.
The problem was you probably ended up falling for the UWP version instead of the Java version. The Java version remains the community accepted “proper” version to this day.
Yep, sounds like my experience. Years ago, we migrated of Rackspace to Azure, but the database latency was diabolical. In the end, we got better performance by pointing the Azure web servers to the old database that was still in Rackspace than we did trying to use the database that was supposedly in the same data centre.
I kicked up a stink and we migrated everything to AWS in under a week.
That highly depends on what services your're using.
We migrated from AWS to GCP in 2016/2017 (mostly VMs and related stuff, CloudFront, etc - no lambdas) and it was pretty painless and everything worked smoothly until the end of that company.
is it because there are features that AWS provides to you that are not available in GCP, or just the fact that setting up exact replicas of processes is hard for migrations like these?
That sounds to my self-hoster ears as an expensive way to do self hosting. Isn't at least half the point of AWS to use their SaaS thingies that all integrate with each other, I think people now call it "cloud native software"?
Not that most of our customers whose AWS environment we audit do much of this, at least not beyond some basics like creating a vlan ("virtual private cloud"), three layers of proxies to "load balance" a traffic volume that my old laptop could handle without moving the load average above 0.2, and some super complex authentication/IAM rules for a handful of roles and service accounts iirc
(The other half of the point is infinite scale, so that you can get infinite customers signing up and using the system at once (which hopefully pay before your infinite AWS bill is due), but you can still do that with VPSes and managed databases/storage.)
The point of moving to AWS is often to benefit from their data centers and reliability promises. So VPC, EC2, IAM, maybe S3 have a clear point.
And one small note: apart from S3, virtually all AWS services are tied to a VPC, any kind of deployment starts with "ok, in what VPC do you want this resource?".
You can get 95% of the reliability for 10% of the price at any dedicated hoster. AWS just figured out the magic word "cloud" means they can charge you 10 times the price.
At Azure or GCP you pay a similar price but you don't even get the reliability so literally why would you use them? The only reason I see is that "cloud" means you can start instances at any time without a setup fee or contract duration. But with the amount of cost difference, you could have three times your baseline "cloud" load running all the time at a non-cloud hoster, and still save money!
It is an expensive way to do self hosting, yeah! I guess one reason is, sometimes it’s easier to just use one of big N clouds – e.g. if you’re in a regulated industry and your auditors will raise brows if they see a random VPS provider instead. (Or maybe not? If you’re doing that kind of audits I’d love to hear your thoughts!)
> Isn't at least half the point of AWS to use their SaaS thingies
It is. (That’s how they lock you in!) I think it’s okay to use some AWS stuff once in a while, but I’d be wary of building your whole app architecture around AWS.
I’m in the self-hosters camp myself :-) I’m building a Docker dashboard to make it easier to build, ship, and monitor applications on any server: https://lunni.dev/
The is sort of petty, but… your web page does the horrible scroll-and-watch-the-content-fade-in thing. This is annoying and makes me want to stop reading. It also makes me wonder if your product does the same thing, which would be a complete nonstarter. Seriously, if I’m debugging some container issue, I do not want some completely unnecessary animation to prevent me from seeing the dashboard.
Thanks for the feedback! No, the product’s UI is definitely on the pragmatic side: I think the only blocking animation we have right now is dialog windows sliding in (and we try to avoid these altogether!) Both the landing page and the app disable animations when prefers-reduced-motion is enabled.
I’ll rethink the landing page animations a bit later! (I was thinking about redoing it from scratch again, anyway :^)
I’ve just dropped you a note in the chat. We’re also on Swarm, and dealing with most of the stuff you address, and some more. Would love to contribute to Lunni, if you’re open to that :)
This is GCS implementing the S3 API incorrectly in a way that really ought not to break clients, but it’s still odd because the particular bug on GCS’s end seems like it took active effort to get wrong. But it’s also boto (the main library used in Python to access S3 and compatible services) doing something silly, tripping over GCS’s bug, and failing. And it’s AWS, who owns boto, freely admitting that they don’t intend to fix it, because boto isn’t actually intended to work with services that are merely compatible with S3.
As icing on the cake, to report this bug to Google, I apparently need to pay $29 plus a 3% surcharge on my GCS usage. Thanks.
> As icing on the cake, to report this bug to Google, I apparently need to pay $29 plus a 3% surcharge on my GCS usage. Thanks.
That's the price of a support contract, not a "bug report". And it's not "plus", it's "or": support costs $29/month or 3% of your monthly billing, whichever is greater. It comes with SLA agreements for fixing or working around your reported problems. Though obviously in this case they'll probably just tell you to use their own python library and not boto.
Oh, thanks Google, if my cloud spend is more than $967/mo, then I don’t get dinged by the $29 minimum. But this is the price of a bug report, because I can’t file a bug report without paying it.
And this situation is bad business. Google advertises that GCS has S3 interoperability support. And they have customers who use it in its interoperable mode. Presumably those customers could use GCS’s biggest competitor, too. Shouldn’t Google try to make the S3 interop work correctly?
GCS and AWS are commercial products for which you pay, not open source projects you can expect to support you for free. I don't know what to say here, if you have something "serious" to do on these platforms then a 3% overhead for support seems like an obvious choice.
Honestly it seems to me like you're excited to have found a bug and want to report it for glory; we've all been there. But No One Cares about that stuff in the world of commercial software. They fix bugs for real customers, not internet rock stars. If you aren't losing even $29 (one mid-tier meal!) from this bug, well... does it even rise to the level of "yell about it on HN?".
It is also the reason to use proprietary bullshit services. If there was not utility gap, then a reasonable evaluation would return that a migration to them is not worthwhile.
The S3 system is proprietary to Amazon, and it's your fault if you're not using Amazon but you're relying on Amazon to not change it anyway, because they have no obligation to you.
The concept of object storage is not proprietary. You should be able to change your code to use a different object storage provider.
> Fast forward about a year, and an uncountable amount of hours spent by both my team, and Azure solutions specialists, kindly lent to us by the Azure org itself, we all agreed the six figure bill to one of corporate daddy's largest competitors would have to stay!
Being tied to AWS and being unable to shake off a huge bill is not a trait of its competitors. It's a trait of AWS, and stresses the importance of not picking them as your cloud provider.
Also, I think it's unbelievable that a monthly 6-figure invoice charged to a company already with cloud engineers in their payroll is not justification enough to shift their workload elsewhere.
Low 6 figures is just a dev. If a team of 5 devs has to work on the problem for 5 years, then it will pay for itself in 25 years. Likely beyond any planning horizon of a company with yearly performance evaluation cycles.
In Europe, even the likes of Amazon pays it's SDEs 70k/year. In Sweden, for example, Microsoft pays it's SDEs south of 800k SEK, which is about 70k dollars/year.
Low 6 figures is an entire team of Microsoft SDEs working full time for a year.
Interesting, because until now I have the same opinion in reverse.
Each case is a special case how everything gets configured, but between Azure, GCP, AWS and IBM clouds, the ones with smoother experience on my case, have been on Azure, based on Java and .NET technologies.
And we also have our share of support tickets across all of them.
Now Azure back in its early days, 2010 - 2016 was kind of ruff, maybe this is the timeframe you're referring to?
Do you think it would be any different/better if you had to migrate to GCP (for example)?
Do you think it was the migration itself or the services on Azure?
Having worked with all three, there's certainly things that suck about all of them but I've found aws "most reliable" but also seems to have a large amount of disparate services needed to do things that were simpler on Azure.
GCP was pretty meh, but depends on what services you used.
Azure is a good choice for .net and sql server (azure sql or whatever it is now) but in but sure a service built for aws is going to "just work" on Azure (or vice versa).
after using AWS and Azure extensively AWS seems to be quite well engineered by some very smart people. The isolation between regions is extremely good and the Availability Zone model is quite effective in making very reliable systems if you are willing to pay the cost of inter-AZ data transfer. My company has an Active Directory controller in 3 different AZs
Azure is a mess designed by smart people with no time and little budget. Azure flat out lies about the AZs they have by claiming two halved of one data center is two AZs
That's okay, Microsoft will rename it to something else and completely change the admin UI and APIs next week. It will now be called Dynamics CoPilot OneAI 365 for Business OneCloud.
In my experience (worked for organizations that used everything from on-prem server racks, to Linode to AWS to Azure), complaints about cloud infrastructure are proportional to managed service usage. I rarely hear teams that largely rely on virtual machines (perhaps with a managed RDBMS) complain. They do have to maintain a little extra scripting, but that's a minor inconvenience compared to battling issues and idiosyncrasies of managed services.
I'm sure it's gotten better now but back in 2016 provisioning VMs in Azure took so long that we joked that every time you provision an instance a Microsoft engineer gets in a car to buy the server.
Reminds me of how my Swiss bank doesn't support transfers outside business hours. I have to imagine when I click "send" in the UBS app, some guy named Hans-Ueli receives a tape printout and goes into the basement to move some silver pieces from one drawer to another.
If everyone in this thread shitting on Azure is going off how it worked in 2016 the comments here make a lot more sense. I know Microsoft bad still lingers in online communities but I have to say I’m surprised hackernews is still this anti Microsoft. In my experience, both Azure and AWS have their issues, it’s not like AWS is some perfect offering but you’d think that based on the comments.
I'm a little confused by this post. Obviously it's easier to maintain a plain VM than managed services. That's why people are paying a lot more money to the cloud providers for managed services, so they don't have to do it themselves. What you're saying is that this is essentially a pointless endeavor? I don't think this statement is entirely uncontroversial, since managed services are the main reason for many companies to migrate to cloud.
Using managed services is not a pointless endeavor – they can save you a lot of time (and therefore money).
Unless you need to switch providers, at which point it may take more time to adjust for differences in how those managed services operate.
Managed services are absolutely not the main reason for moving to the cloud. Companies do it for the flexibility that comes with renting the real estate/energy/hardware instead of owning it.
Yes! The longer response is that the closer you stick to standards the easier of a time you will have. VMs are a standard with cloud-init and image formats, etc.
i.e. in 2025 managed Kubernetes is not _that_ different between providers
Heh, until you need to rollback a specific table in postgres using their backup solution. IIRC, this is possible in AWS -- or at least, I'm 99% sure you can at least download the backup. In Azure? All you can do is restore the entire database, and you cannot download it.
I mean, if it was solely about renting machines, we’d all just use DigitalOcean, or EC2 on AWS.
People use things like RDS and EKS/GKE to avoid all the administrative overhead that comes with running these things in prod. The database or its underlying hardware has a problem at 1am? It’s Amazon engineers getting paged, not you (hopefully… assuming the fault hasnt materialised to operational impact yet)
I've never seen a managed IaaS that saved time. It is marketed as something that can free you from hiring ops people, but you will absolutely need to hire some supplier relations people to deal with it. (And contract optimizers, and internal PR to deal with the fallout.)
It's different for fully featured SaaS. It's a matter of the abstracted complexity vs. interface complexity ratio that is so common for everything you do in software.
It's easier for the provider to maintain a VM provision. It's supposed to be easier for the customer to maintain managed services, but that's often debatable.
cloud services are great when they work. but in case something isn't you have no way to debug anything except maybe restarting the service if it's even possible.
we had one customer that needed IPsec tunel to vpc where production servers were living, we didn't want to maintain such setup just for single customer so we check Aws offerings. and look at that they have managed IPsec solution, great.
until client called that tunel is down and solution wa that they need to restart it on their end to resume connection. why? you can enable some logging to S3 but according to them everything should work. what we should do next?
but even if you stick to just ec2 thing can go weird. our recent incident: ec2 instance stopped responding but ASG didn't replace it, any action on it throwers error that instance is not running but it was in running state.
I wish I could better help my org see that. Luckily my boss agrees with me, but he's not in full control. Between the vendor lock-in, and the _almost but not quite api compatibility_ with OSS... I just dread as more teams adopt it.
Azure's anti competitive conduct is also the reason that AWS stopped lowering prices.
Before 2014 or so, AWS would periodically reduce prices on major services passing on falling technology costs.
Azure didn't like that, so they aligned their prices to AWS's, matching immediately the same discounts on the same service.
This is a form a predatory pricing, because the goal is to kill the incentive for competitors to reduce prices, by denying them market share gains when they do.
We bought a company hosting on Azure. They used hosted Postgres and are hosting .NET services on Windows. Small infra, in range of 2-3 hundreds cores and 1T memory.
Every few days M$ randomly shuts down random instances for maintenance, disconnects network for >10 minutes.
Migrated off hosted Postgres because performance was tragedy - now their India-based expert led us to use different volumes type and after instance restart database didn’t start up because of I/O latency. Expert don’t want to meet for 3 straight days now, because he is busy. RCA (half pager, written probably by some LLM) says it’s not their fault, but charts says different story.
The only thing they crash GCP and AWS with is dashboard that loads everything so quickly... sad you can’t run e.g. making 2 similar network operations in parallel because they will fail or take 10x the time they would take when run one after another.
Of all the paas providers Azure have the worst abstractions and services.
In general I think it’s sad that most buy in to consuming these ”weird” services and that there’s jobs to be had as cloud architects and specialists.
It feeds bad design and loose threads as partners have to be kept relevant.
This is my take on the whole enterprise IT field though!
At my little shop of 30 so developers, we inherited an Azure mess, built abstractions for the services we need in a more ”industry standard” way in our dev tooling, and moved to Hetzner after a couple of years.
A developer here knows no different, basically - our tooling deals with our workflows and service abstractions, and these shouldn’t change just because new provider.
1/10-th of the monthly bill, and money partly spent on building the best DX one can imagine.
Great trade-off, IMO!
Only two cases come to mind for using big cloud:
- really small scale: mvp style
- massive global distribution with elasticity requirements.
Two outliers looking at the vast majority of companies out there.
I am part of a team building an automation tool for cloud provider creation of Projects/Accounts/Subscriptions (depending on provider). Our primary provider is GCP, and implementing that was fairly easy. Some gotchas, but easily surmountable.
Now we have gone multicloud to Azure and we need to add support (We were historically on AWS but moved 95% off, we still have some teams on there but we rarely build tools for it outside of Terraform modules). And the Azure API, MS Graph API and Go SDKs for bith are the biggest piles of trash I have ever worked with. Everything is a pointer, even string literals need to be made pointers, but sometimes they aren't....
Documentation is in accurate. Some apis take just the ID, others take a full path. Some of it is documented, many apis have the wrong one documented.
None of the APIs return related resources IDs, you have to search for all of it. So many name based searches. I had to add a caching layer for IDs during creation so I didn't have to lookup the same resource over and over (we use a state machine for creation and it can be resumed midway and other fun things, so we need a lot of checks and resume based code.
Overall it is the worst designed and implemented cloud provider. I would never recommend or choose it if given the power.
I really wanted to like Azure because of how well it integrated with the rest of my tools, but I kept getting hit with VM availability limitations and UX quirks. I've never had issues getting machines in AWS, or feeling like my actions were taking effect.
I've also waffled several times on the Azure FaaS offering. I am now firmly and irrevocably at "Don't use it. Run away. Quickly.". The experience around Azure Functions is just too weird to get comfortable with. Getting at logs or other binary artifacts is a gigantic pain in the ass. Running a self-contained .NET build on a blank windows/linux VM is so easy it doesn't make sense to get in bed with all this extra complexity.
Ugh, yes. Lack if availability of resources in whichever region i happen to need them.
Also, things that break automation, like calling back to say your sql server is up and running when in fact it’s not ready for another 20 minutes. I am half sure the terraform time_sleep was written specifically to counter azure problems.
You missed the perfect middle ground between serverless and mouse-configuring an IIS VM: Azure App Services. It's the same service function apps are using once they advance beyond the trivial function and require longer runtimes or no spinup delay.
App Services takes some getting used to, but it's a locked down Win Server/IIS container with built in FTPS, self-healing healthcheck endpoints, deployment by pointing to a repository, auto-scaling options, and a 99.95 SLA.
A few years back, it was a bit of a dog performance-wise, but the modern CPUs have been no problem for a 2+ vCPU, Premium level SKU. Pricier than a VM, but dealing with security and updates for a webserver VM is a ton of work.
I believe any Azure user might be able to compile their 100 reasons to not use Azure, and the same will be true for most big pieces of software.
Even as someone that had minimal exposure to other clouds, I could easily see how Azure user experience lags due to the lack of proper care.
The amount of pages with a filter bar that will not work properly until you remember to click the load more should clearly be zero at this point, this is an objectively bad pattern that existed for years and should be "easy" to fix. But the issue will probably never be prioritized.
The fact is that unless tackling those issues are part of the organization core values or that they are clearly hitting their revenue stream, they won't be fixed. Publicity and visibility of those issues will always be crucial for the community of users.
I have significantly more experience in AWS, but I've spent equal time building and securing infrastructure in Azure for at least two years now. While AWS is not without it's rough edges, I'd pick it any day.
My number one concern with Azure is availability of resources. Working within US regions, we've had to shift regions during production rollout because one or more of the resources we needed -- a current gen Azure SQL database or App Service Plan -- were simply not available. Rolling out an inexpensive VM (think equivalent of a t3/t4g.micro) is always a ride too, between unavailable SKUs or excessive quota gatekeeping.
Spending gotchas exist on any cloud, but we also know someone who got caught off guard in a completely new way recently. In late-December, the team needed to automate a database event once per day on an Azure SQL instance. Scheduled jobs aren't natively available inside Azure SQL, and so they reached for an elastic job agent. Everything went smoothly until someone dug in to a price increase on the January bill and asked why Sentinel had jumped from under $200 to over $3,000.
A colleague and I helped them dig in and quickly discovered that the controller for the elastic job agent is running dozens of batches per second in order to schedule that one job per day. With default security audit settings on Sentinel to meet compliance obligations, this generates over 600GB of BATCH_COMPLETE log messages per month at a cost of $5/GB for ingest!
Vastly underrated cloud if you’re a small company and don’t operate containers is Cloudflare. I know they get criticized for other reasons but their DX is actually really great if you’re tired of the big 3 (4?)
IME even a small company runs into something you need an actual server for, and then you're suddenly spread across two clouds because Cloudflare is serverless or nothing.
yeah i don’t understand why clouflare isn’t competing head on with cloud providers. Imo they should acquire fly.io, give them a blank check and 2 years and I think they can take down aws.
The reason aws is dominant is because it’s the default and a known quantity. But developers and cost conscious organizations will look at alternatives. Not saying it would be easy but the prize would be huge. Plus AWS seems in chaos with a lack of sensible leadership.
Same, I see it as the single missing piece in the puzzle. If they had a VM product then for small businesses it'd be such an obvious choice, especially if they provide anywhere even just half the nice integration with their other products as they do with workers.
I'm guessing it's just because they're so extremely all-in on every single of their products being on the edge, and you can't make that work with VMs without becoming far more expensive than any of their customers would pay.
Or maybe I'm clueless. But it sure looks that way from the outside.
I moved off a fleet of VPSes onto Cloudflare Pages and subsequently back to VPSes again due to unpredictable latency, several cases of downtime in 12 months and weird bugs around static assets disappearing long before the advertised retention date for old deployments.
It’s a JavaScript/node-like environment only right now no? I love it for my svelte site but it’s a massive limitation to be locked in based on language and request-response based constraints right now.
You need at the very least containers and persistent volumes to be interesting to me at least.
We ran our whole platform, written in Rust, in the Cloudflare Workers. It was not a great experience. You need to use their SDK, with really interesting bugs that never got fixed (we always forked a version). It was pretty hard to test anything locally, you just had to deploy your code to their platform, which took time and made the feedback loop to take so much time it blocked us from delivering features fast enough.
And yes, you can test your local Rust code. It works nicely on your machine, but breaks with a really nasty error on their platform.
The target is `wasm32-unknown-unknown`, which allows you to use `fetch` as your only source of IO. Ok, their workers has a hacky socket implementation nowadays. Non-standard of course. And most of the ecosystem won't work really without forking everything and fixing the bugs by yourself.
We pivoted to a native Rust project. We still have one worker running in Cloudflare. We isolated that code from the workspace so that renovate updates will not touch it. You know, a random version upgrade might break the service...
Interesting choice to support Rust over Go if you ask me. Don't have numbers but I don't really peg Rust as a popular language for serverless web apps, certainly not to the extent of Go
I work in both AWS and Azure and let me tell you, one thing I absolutely love about Azure is their portal. It’s like AWS 2.0 where all the cloud cruft is abstracted away and all that is left is the knobs you actually need to turn, and how they relate to one another.
I love me some AWS, but my god every time I have to dive into an unfamiliar environment and try and reverse engineer how everything connects - I need a drink afterwards.
I’m having a really hard time believing you are serious. The one time I tried out azure for a few days the portal was absolutely painful. Every click would take 5-10 seconds for a response. Sometimes basic settings change actions would take 2+ minutes of watching an Ajax spinner. How can anyone enjoy working like that???
Sure, the UI is sluggish, but at least you don't have to move through three different "services" to find the routing table that your VM is using.
AWS UIs are generally snappy and smartly designed individually, but they are horrendously organized at the general level. AWS is built as if you are exploring a relational DB containing your resources, instead of a deployment tree.
Your VM doesn't have a NIC in AWS, it has a foreign key to your entire VPC's NIC table, which lives in the VPC service, not the EC2 service. And then your NIC doesn't have an associated subnet, it has a foreign key to the subnet. And then when you get to the subnet table, you look up the routing tables table, and finally in the routing tables table, you'll find the settings for the routing table. This all works through following links, but the constant context switching and tabs upon tabs that AWS UI requires are extremely unpleasant for me at least to use. I'll take Azure's sluggish UI that organizes all of this in one-two pages instead of four any day.
AWS pages are built by different teams, and it shows. We're all supposed to use IAC though, right?
In all seriousness, even in the face of IAC, the one thing Azure can do that AWS can't [at the time this happened to me], is have a global view of everything that's running right now and costing me money. It was years back, it was a $5 bill, but the principle of it had me livid. I did my best to tear down everything after my evaluation, yet something was squirreled away costing money.
So yeah, absolutely, sluggish UI all the way (I also find the Amazon storefront profoundly ugly and disorganized).
ENIs are under ec2 in the console, not VPC, on API/CLI they're all under ec2 together with all networking.
If you click an instance and go to its networking tab you get a list of ENI IDs that are clickable links to the resource, same for vpc and subnet. If you click subnet you can just click the route table tab, so if you're on an instances networking tab the route table is 2 clicks away.
But rather than doing this you could use reachability analyzer that allows you to check routing tables and security groups for a source and destination IP/resource and port on same or different VPCs connected with peering or TGW and it will tell you if you're missing routes or SG rules in either direction. I created a slackbot that allowed our devs to input src/dst IP/domain and port an that used this API to do the check for them, saved a lot of time troubleshooting.
I had an absolutely horrendous time working in Azure a few years ago (as a network engineer), we did have quite a complex setup with custom route tables and Azure Firewall though and VPN connectivity between Azure and AWS, but stuff like their VPN gateway taking 40+ minutes to change instance size on, wtf? I've filed 2-3 bugs to AWS in the almost 10 years I've worked with it, all for newly created APIs/services, they were all fixed within a week or two. I filed 8+ bugs to Azure in the first month using them, none of them were fixed as they had workarounds instead. And their documentation is absolutely useless, I could never trust that I understood what I read correctly, I always had to verify that it worked that way by testing it.
must be a paid troll, no self respecting intelligent engineer would find the Azure portal good. it’s horrible ux, really convoluted and complicated, very unintuitive, horizontal scroll is a joke when the web scrolls vertically, tiny fonts making everything hard to read and screens overloaded with so much shit and yet they managed to not put on the screen the main thing that developers would care about. it’s a complete joke
My god, I've been sending feedback about this shit for 3 or so years. You can't open any-fucking-thing in a new tab. The funny thing is, it used to work and they fucked that up.
I don't understand how dumb you must be to design a web site that way. It's like a brewery that sells their beer in plastic shopping bags and thinks that's good.
I think OP is trying to differentiate between Azure APIs, which are unbelievably slow and horrible, and the UI design itself - the layout, the font, how one screen will flow to another screen, what links to what, how it would be laid out in a tool like Figma.
Azure's APIs are atrociously slow. Azure's UI design is pretty nice. There's not much the UI designers can do about their API colleagues.
I know we're talking about AWS and Azure here, but had to add that fwiw, the M365 admin interface(s) are so bad it practically feels like a prank. In other words, it's as though someone is purposely making them as chaotic as possible to what end I can't even guess.
I think it was the InTune interface I was in the other day that had the same link underneath 4 different sections of the dashboard, which I noticed when I had all 4 of them expanded at once? That got a good laugh out of me.
"Here...don't miss this settings page! Seriously! Look!"
Not familiar with the product. The MS name alone would make me biased. Is it really good or even better in some way? Or did you just slightly ironically mean they did not manage to make it worse than competing products?
A great thing about mice is that they are fungible and don't change without the user's consent, unlike software, so you can keep buying and using the same mouse forever.
The average mouse of the time was blocky and uncomfortable.
This comment reads like rage bait (I am not saying your opinion is "invalid" or you're lying). I've never meant anyone who likes the Azure portal lol, even people who live inside of the Azure ecosystem hour by hour.
The Azure portal has some nice ideas - in theory, being able to divide stuff into "resource groups" works a lot better than the AWS approach of "divide resources that should be isolated from each other into separate sub-accounts".
In practise, even the good ideas are implemented poorly.
That would have been nice, AWS kept sending me bills for $0.00, and after multiple tickets over a couple years, I finally deleted my entire account due to how pathetic their support was (they never figured out which service was active, and I couldn't find a way to figure it out using their UI).
AWS is actually great once you spend few dozen hours in the service. If you are using the service for the first time GCP feels a lot smoother but then you begin to see corner cases and everything and GCP just breaks in those. Azure is bad the first time and gets worse over time.
My experience too, I could never find anything even if I know it's there, and I was told by my boss to use it. I stand there for a half hour, credit card in hand, then go to AWS where the equivalent can be located by mortals. It's like they don't want anyone's business.
Yeah I don’t get this thread at all. I’ve used both fairly extensively and while Azure’s dashboard is still a pain in the ass, it’s better than AWS by a mile usability wise. Not to mention, Microsoft clearly puts time and money into their documentation, while AWS docs have always sucked.
Most of my complaints of Azure come down to the UI. So many head scratching moments and if you don't have a 4k monitor lot's of scrolling of menus inside of menus
How do you find the price difference? Whenever I've done comparisons, they have always worked out significantly more expensive than AWS, and AWS is already pretty damn expensive for anything that requires a decent amount of compute.
I’m in the same boat. To me it’s bad UX. I can never find anything and it just looks way too complex to use. It shouldn’t be this way but as someone that uses Microsoft Entra, I guess I’m not surprised.
Not on laptops, which you'll be using if you are on call. It's even worse if you don't the eyes of a 20 something year old and have the text scaled up a bit.
As Azure Ops person spearheading migration off AWS and another cloud, It's pretty funny. Some of it nitpicky, other is sharp edges I've cut myself on and some is Microsoft refuses to update required TF version because they are bending over backwards for compatibility reasons which is beyond frustrating.
However, all the comments about the Portal are baffling to me. AWS portal is just all over the place, I feel like people are expecting AWS awfulness and when portal wants to be consistent, it's breaking people brains.
Oh yea, Day 313 with Public IP to put into DNS. Alias record that you noob. :P
For what it’s worth, I’m a student, and have had the benefit of seeing both the AWS and Azure web interfaces for the first time in the past couple of years. Azure was astoundingly more intuitive and less bizarre than AWS as someone with no experience working with the big clouds.
Even doing classwork involving AWS was an exercise in frustration. I couldn’t actually believe the sort of button trails and on-hover menus I was told to use to access various functions.
I don’t have the experience to evaluate the technical functionality, or whether AWS’s interface is better for experienced users, but I can definitely say it was far less approachable as a novice.
Yeah I don’t get this thread at all, AWS is a usability nightmare, as are all Amazon products. Microsoft products aren’t great usability wise but they’re clearly better than Amazon imo. I have a feeling a lot of the commenters here last used Azure when it was still in beta and the dashboard was live tiles Win 10 style.
Last I tried to use Azure; it did not even offer domain registration. This was many years after launch - and not too long ago.
Not sure if this was an Australia/Oceania limitation - or just an ongoing product limitation.
My requirements weren’t complex. I needed to manage my domains (not AD), spin up virtual machines, and associate the two.
I also found the UI, overall; tedious. Finding the right offering under their ambiguously named services was difficult. And this comes from an AWS user.
I wanted to like Azure, but for very least reasons above; it’s not the product for me.
Did you want domain registration, or just DNS management? Those are two very different services. They offer the latter but not the former. So while you (generally) have to buy domains elsewhere, you can then manage them entirely within Azure after doing so.
For me, Azure is very good at using it to do SAML/OAuth/OIDC for in-house and 3rd party applications. It works wonders and is very cheap. I mean, I think it is the best IdP out there in all SaaS offerings on the market.
The cloud part (VMs, k8s etc.) is something that I touch if I am being forced to. Even creating a VM is way more complicated than it should be.
I'm curious why people pick Azure, if anyone here has direct experience with making the decision.
I work at a startup that runs on Azure, and we're only here because of Microsoft's monopolistic behavior. We switched because Microsoft gives Office 365 discounts to our customers as long as all the the SaaS services they use are hosted on Azure, and so our customers demanded we use Azure. Part of the monopoly playbook: "using a monopoly in one area to create a monopoly in another".
I used to work at GCP, and I thought it was almost shameful that we were in 3rd place behind Azure. Now it just makes me mad (especially since I had to migrate our startup from GCP to Azure).
Pretty similar reason, customers use Azure so the incentives are in place to run more things on Azure.
Case in point at work: we need to set up Azure infrastructure per-customer. Hitting the Azure RM endpoint from outside the Azure network is not reliable; the API endpoint's DNS record points to one of two IP addresses in westus, and when the DNS record flips (presumably for blue/green deployments) the no-longer-referenced IP address immediately aborts the connection. The official Azure Terraform provider throws an error when this happens and it usually results in Terraform state losing track of something that it already created. Azure support just says "well all we see is 200 OK from our side".
The "solution" is to run the Terraform workload from within Azure. The SLA is only really guaranteed if you're connecting to the Azure RM API from within Azure. Cue the insanity.
In my case, the company I work for isn’t a software company. The bean counters / IT group would rather just have something tacked onto their existing Microsoft subscription vs. something entirely different.
Also, I suspect part of the reason people are hesitant to use GCP because Google is perceived as a company that will gladly kill products off on a whim. Not great for something mission-critical.
From what I've seen, when people pick Azure, it's always about money.
Whereas with other providers, it can also be about money (because they offer a big discount or because the migrating partner is cheaper), but it can also be about wanting one service/feature that only X is providing; and once you're in, people tend to prefer to put everything there, instead of doing poly-cloud.
In my region, Azure salesmen are very active, providing huge discounts, so Azure is the most popular amongst big companies.
Meanwhile smaller ones will go on AWS because its easier to find information and (actual) knowledgeable people.
I used to work in a company using AWS : everything was managed through Terraform and we were as cloud agnostic as possible (mostly containers).
Then we were acquired by a bigger company with a Azure deal, so they told us to migrate from AWS to Azure.
They provided us with their own experts to help us, but six months in, we were still unable to have anything remotely viable for UAT.
The experts were starting to acknowledge that even with they years of experience, they still weren't convinced with this whole Azure stuff so they actually relied heavily on a legacy on-prem DC.
That's when I left.
Last time I heard about old coworkers, the product was still running perfectly fine on AWS, while there was still a team working on the migration.
It's been more than two years now.
And I had other bad experiences with Azure.
I know that cloud providers are not fun if you don't start with two weeks of training, so I try to stay open-minded, but no matter how many Azure experts I talked to, I never found one who was actually confident in using it.
- we use mainly GCP
- we do not want to use AWS because of random political issue (absurd, in my opinion but whatever)
- we are raped by GCP and would like an alternative to help keeping the price "acceptable"
How does this work? Do they demand you to use Azure for your servers so they get a discount? Or do you have to create instances of your product in e.g. VMs that are put in their Azure account? Did you have to completeley leave every last bit of gcp behind? How is this checked by MS?
What you call “monopolistic behavior” a senior/lead engineer with a brain will see as ecosystem compatibility. If you are working within an org that already uses Microsoft for everything, why the hell would you bother introducing a new stack everyone else will have to learn rather than use the Microsoft offering if it’s competent enough? On top of that, the Microsoft product will most likely work nicer with other MS products. Same reason people buy iPhones and Macs and stay in the apple ecosystem. Yeah it’s not as hip and exciting but enterprise development is rarely hip or exciting. At a startup not already using MS products, yeah no shit you can use whatever you want with little to no considerations for compatibility within the stack, especially when your main goal is cost savings.
At my previous company, we lost countless hours troubleshooting a Dockerfile that worked everywhere except on Azure. It used Node 18 as the base image, and the solution ended up being a chown 0:0 on everything inside the container— which took an absurd amount of time during every deployment.
Yes I believe Docker/AKS somewhat recently defaulted to least privelege for container users, so you end up having to explicitly grant access to every little thing...
I am constantly forced to use Azure by idiotic companies which use .NET and the entire .NET mono culture which fetishises Azure and I can say with clear conscience that Azure is the shittiest dumbest most ill engineered clusterfuck of a cloud that has ever been unleashed on developers. It’s so bad that in the last 5 years even some of the most die-hard C# shops in the UK have changed their leadership and started to move away from Azure because they cannot afford to ignore the absolute insane state of it. Literally at every junction where Microsoft could have gone with a feature in Azure one way or another they somehow managed to not only pick the worse of the two, they somehow managed to bastardise it even more beyond anyone’s imagination.
I’m running two k8s clusters with 6vcpus and 12 gb of memory each and when we got the quote from Azure it wasn’t pretty. The managed pg was a deal breaker for us.
P.s.: side gig with 1qps on eu cluster and way less in US.
P.P.S.: happy Digital Ocean customer
Sometimes I create a webservice for fun in a single PHP file, using PDO to connect to a database and then upload it using SFTP to my server. It scales pretty well unless I‘m becoming the next largest tech company in the world.
Lol. I would follow an account like this for something I use a lot. Like C# as a language, Unity, hell, even Chrome. Has the energy of the "Linux Sucks" talks. Usually the undercurrent is that something that can have so many reported faults is something worth having a love-hate relationship with :D
That said, I only use Azure for redundancy. Hosting one app I have there on anything but Azure would be pretty much infinitely cheaper, especially when my IO quota goes above ~1GB a month which with azure causes a need to instantly change to a 20$ plan for the month.
Ah, most of b-guys are uncompetitive and make money by just being in already lucrative businesses. That's why there are solutions with Microsoft technologies.
I quite like Azure. Microsoft was also way more responsive than other cloud providers when we were looking to shift providers, we got free 3rd party consultation to setup our infrastructure in azure, it all ended up being more for less for us. It's all setup using infrastructure as code which is pretty maintainable and easy to add new stuff. Almost everything can be done via command line. Don't really use the portal UI to set anything up, but we do use it to look at the state of things. Haven't really had any problems with any of the services.
Boards works for basic Kanban projects, but if you want to dwell into scrum stuff like sprints, burndown charts, etc, it's very bad and cumbersome. You'll have to do a lot of stuff manually that Jira does automatically.
Wiki... It's not good. It's extremely slow, and lacks a ton of features that Confluence has.
Pipelines is godawful, and has been suffering severely from a migration from their "Classic" pipelines to YAML. The funny part is that if you go in depth into YAML pipelines, you'll notice there's a very large amount of things that aren't configurable by YAML. Also has a ton of bugs, many of which have been open for over 5 years. To make matters worse, it's currently in an identity crisis with Github Actions (which has more features and is continuously getting them over Pipelines).
I don't know what's the future of Azure DevOps, honestly I feel like they'll eventually shutter it and move everyone to Github Enterprise.
Azure Devops contains lot of things like Jira and Azure Pipelines etc. The Jira equivalent interface is confusing but its not a showstopper, you learn to live with it.
We use it daily. The other option we had was a combination of self-hosted github (which at first didnt have actions), jira, and confluence. When actions was nog available, ADO was used for pipelines, so that was 4 services.
Give me 1 integrated service built with the same stack as Azure anyday. Builtin service connections, managed identities, etc.
Been using Azure for a few months now, mainly the AI part (AI foundry, AI search), and it feels like a product ran by juniors with no guidance. One day, the entire PromptFlow reaction was down, no status, no info.
To create a new deployment (which is basically a PAYG model that doesn’t really require this limitation), you’d have to switch to the old UI to find the button to create one.
I agree with comments about cloud providers, most applications would be better hosted on VPS or services like Digital Ocean. But hey, software developers like to look smart by complicating things
As a consultant who makes most of their money off of Microsoft technologies, I'm finding more and more reasons NOT to use Azure.
Chiefly among them is their famously bad support. Just google it- I've never spoken with anyone or seen a single written word saying that Azure support is even decent.
It's a race to the bottom platform, and I'm starting to get to the point where I want to start selling AWS.
I cannot count the number of times I've found a Microsoft support forum question that's exactly my problem too and the official tagged Microsoft support person fully misunderstands the question and then doesn't even properly answer their own misunderstood question
I see this title and all I can think is, "There's only 400?" I am not impressed with Azure, I wish my company was using AWS, everything about AWS was much more reliable. The Azure Portal is not to be trusted, it can just lie to you at times.
It's like this in a lot of subreddits dedicated to a particular product. The regulars are die-hard $product fans, and respond to perceived negativity just as you'd expect.
Just earlier this week it was charging me for some mysterious "api management service" --- it was such a pain in the ass to cancel. I had no idea what it was about. I contacted support, got them to supposedly reverse the charges and reported the credit card as lost before the charges went through just in case. I just wanted a damn api key and couldn't for the life of me figure out how to find it (I work at a cloud computing company, I do this stuff for a living and I still couldn't figure it out).
It's an obtusely confusing interface with opaque pricing that you can just magically sign up for with innocent sounding names. It's like the dark pattern people from Intuit came on over for a house party and got drunk one night.
The great thing about Azure is not the security it's the reports about how you are secure. The later is legally required the former is only visible experts.
microsoft cloud is hosting/protecting stretchoid.com which I get scan/(hack attempts?) all the time. I am self-hosted, and those are a pain. As far as I know stretchoid.com is not selling the scan data...
I plan to drop IPv4 and go nodns/IPv6(/64 or /96 prefix) for self hosting.
The Microsoft Graph SDK is the worst piece of shit I ever saw. The ONLY actual good part is the JS SDK. (I know right?!) Aside from the horrible DX, It Is FULL of bugs.
I was tasked with a project where it made sense to use Azure Durable Functions. Again... BUGS ... I reported a couple of them and even went and spoke with the product team about those. One BUG was due to a misunderstanding of how the framework works (in my defense, the documentation was very unclear) and the rest of the bugs are still not fixed almost 2 years later.
I decided to fail the project and restart with a different approach and framework.
Working on enterprise or higher level Microsoft is a way to get Grey hair fast. All the way back to Server 2003, we had the infuriating inconsistency of group policy, roaming profiles, DFS drives. Everything is full of errors, you will have a larger IT team as a result to deal with the headaches.
After using Google Workspace for IT, and AWS for infra, I always tell people to stay far away from Microsoft.
Even now, I have a friend who can't honestly deploy intune, because of inconsistencies of the "type" of enrollment, and it being able to execute a winget script as a result. Despite both machines enrolled in intune, the one that was enrolled during OOBE can run the scripts, but the machine enrolled in the OS cannot. Microsoft support has had that ticket for weeks.
Meh, as a developer who lived through the 2000-2010 era Microsoft, it’s easy to come up with a laundry list of reasons to hate on Azure.
But I tried Azure for my most recent startup because I was offended by AWS, and GCP did not have enough adoption among my customers, and Azure worked - fine.
What do you really need out of a cloud?
I want them to rent me VMs, for them to not go down, and to make it easy to do standard stuff like an object store, run containers, run databases, etc.
That’s how clouds try to lock you in, by making you use a custom tool that is different for the sake of being different.
If you use standard tools you don’t have this problem.
Containers running on VMs is standard.
A mesh of microservices that depend on cloud queues and managed services is not.
One argument against standard containers is saving dev time. You can still save dev time by using standard open source software. How many different ways are there to implement a queue or a load balancer?
If you really need access to some proprietary technology then by all means use the cloud that offers it. Eg if your customer demands GPT4.5, then go with Azure.
But if you need something standard, don’t get caught in the trap.
I am an older guy that was building kubernetes clusters before eks, aks, gke. So I used terraform to build shit out to make it happen. Azure was 5x the code just to be different. You can try to blame terraform but if you used MS custom tooling it was no different.
What about the way Terraform is a 3rd class citizen on Azure? And there are multiple ever-changing ways of doing everything, major parameters aren't supported, etc. It just makes it more difficult to deal with.
> I will just say that Azure seems to want to do shit different for the sake of being different.
That's Microsoft's MO in a nutshell in my experience, and I say this as a recent(~5yrs ago) convert to Linux who built a career on Windows endpoints, servers, ADDS, Exchange, SCCM, you name it. It's how they achieve lock-in to their ecosystem, and it's incredibly frustrating to see how they've just layered that method of operation over and over again, decade after decade, rather than fix anything.
Conversely, doing things "the same way" as AWS would mean copying their first-generation public cloud design flaws.
The overall UX of AWS is absolutely crazy. It's easy to "lose" a resource... in there... somewhere... in one of the many portals, in some region, costing you money! Meanwhile, Azure shows you a single pane of glass across all resource types in all regions. It's also fairly trivial to query across all subscriptions (equivalent to AWS accounts).
Similarly, AWS insists on peppering their UI with random-looking internal identifiers. These are meaningless and not sortable in any useful way.
Azure in comparison allows users to specify grouping by "english" resource group names and then resources within them also have user-specified names. The only random identifiers are the Subscription GUIDs, but even those have user-assignable display name aliases.
The unified Portal and scripting experience of Azure Resource Manager is a true "second generation" public cloud, and is much closer to the experience of using Kubernetes, which is also a "second gen" system developed out of Borg. E.g.: In K8s you also get a single-pane-of glass, human-named namespaces (=resource groups), human-named workloads, etc...
A single pane of glass that shows all your resources that are currently choking due to hidden limitations is no flex over AWS. It is my hope I never have to use Azure ever again professionally or otherwise
This. We've used GCP Appengine for years and it is rock solid. Their SRE game is top level, and when there is an outage, they do a serious investigation and make it fully public, even if they screwed up badly. Including the vital "this is how we're going to stop this ever happening again". The last outage (that we noticed) was several years ago.
Azure tend to bite back when they upgrade their backend and everything breaks
Happened twice with my kubernetes deportment, first something with node groups made them incompatible and had to recreate the cluster from scratch, then one of their scripts to rotate key access to volumes (that one has to run manually, go figure) stopped working and caused my volumes to detach from pods, and has to recreate the cluster again, and I just give up.
I was super happy as well, the first two years. By years six I was fully migrated out.
I need it to be sane, to work and to be reasonably well documented.
Azure ostensibly fails outright on points 1 and 3, and limps by on 2.
The products are confusing mess, there’s way too many ways to auth things, docs are garbage, tried multiple times to manage stuff via Terraform which broke far too much to be excusable, to say nothing of the dumpster fire that is their UX.
I’m sure some people have either beaten it into submission, or have stockholmed themselves into putting it up with it.
Asking: Okay, clearly there are a lot of
people here with lots of experience
running their software on cloud services
from Microsoft, Amazon, Google, etc.
Good.
But what about a solo founder running
their Web site on a "full tower" computer
they plugged together themselves?
So, why use a cloud server farm with its
expense and complexity?
Or, get a mother board, a processor with
16 cores and a 4+ GHz clock, 128 GB of
main memory, some rotating and/or solid
state disks for a total of 20 TB or so,
some external disks for backup, a recent
copy of Windows Server, applications
software from .NET, and a 1 Gbps Internet
connection?? The computer -- tower case,
power supply, motherboard, processor,
disk, solid state disks, and Windows
Server -- costs ~$3000?
So, a 1 Gbps Internet connection, ~$100 a
month, would have capacity of, say, 100
MBps. If sending a Web page with 200 KB,
then the peak capacity would be
100 MBps / 200 KB = 500 pages/second.
Then 500 pages a second with 5 ads per
page with revenue of, say, $2 per thousand
ads sent (CPM), that would be
500 * 5 * 2 / 1000 = $5/second
at peak capacity or maybe an average of
half that for $2.50/second.
Then at 16 hours a day that would be
revenue of
2.50 * 60 * 16 * 30 = $72,000/month
For the electric power, at 200 W and
$0.10/KWh, that would be
200 * 24 * 30 * 0.10 / 1000 = $14.40/month
How many users?
With peak capacity of 500 pages a second
and average of half that, 250 pages a
second, for 16 hours a day, that would be
250 * 60 * 16 * 30 = 7,200,000
pages a month. If on average send 5 pages
per user, that would be
7,200,000 / 5 = 1,440,000
users per month.
If users come on average 2 times a week
that would be 8 times a month or
1,440,000 / 8 = 180,000 users
from one tower case and some Web page
software.
So, why use a cloud server farm with its
expense and complexity?
Redundancy and failover capability, mostly (in terms of everything.... power, network, storage, compute.) For a hobby project, what you describe is probably fine. For a real business, do you want to tell customers you're down because your one computer shit the bed and you have to run to Best Buy to get a new motherboard?
> For a hobby project, what you describe is probably fine.
Thanks! Yup, it will be "a hobby project" until, if ever, it gets some users and revenue. Then, have several servers, some load balancing and redundancy, uninterruptible power with a generator outdoors on a concrete slab, contact Cloudflair or some such and have them do what is needed, etc.
Just looked up SSL, reverse proxies, firewalls, etc. Okay.
If you're virtualized on your host, 2x HAProxy on top of OpenBSD utilizing carp. It's great fun to set up and run -- and once you have it running, it's stupid stable. Very little maintenance required.
Your local site can't implement DDoS protection though, you have to buy that from cloudflare or some other reverse proxy anyway, so cloud is always in your future in some form. Also your local site can't move to Australia or Taiwan or wherever when you realize your user base is more global than you thought.
I mean, your intuition is correct that "cloud" is mostly just a bunch of boring, standard computers running boring standard software and there isn't anything they do that you can't.
But at the same time, boring standard software is (by definition!) commoditized and if you're spinning up some new and interesting thing, it's only going to be differentiated by the parts that are not boring and standard. So put the boring standard stuff on a credit card and do the interesting stuff instead.
If you have the skills, and the bottom line is still positive (include opportunity costs and personnel costs, ease of getting SOC2 / ISO certification if that's relevant to you, ease of scaling up and down), then you should go for other solutions.
I advice CEOs of SMEs on this, and I can tell you that the main concern they have is availability of people to build and maintain the systems. Because cloud / k8s is more popular these days, that's what they go for. If we could reliably find smart system operators that will happily maintain a couple of racks of servers for years, it would be a more viable option.
Where’s the firewall? Reverse proxy? SSL certificate management? High availability? Patch management? SCM? Central logging and alerting?
As someone who ran many IIS boxes since IIS 4, I greatly prefer Azure websites over having to worry about all the “other stuff” that comes with onprem. Yes, the cloud needs an RP, WAF, etc, but they’re always HA and simple services, not another box to maintain.
Careful, anytime the topic of self-hosting comes up, you will see a bunch of engineers crawl out of the woodwork to insist only the cloud can handle the complexity.
Because you might need 20,000 of those "servers" immediately, and not have the up front capital to for an investment like that. And maybe it doesn't work out and you just needed those servers for 2yrs vs $your_depreciation rate
And you'd need about 19,997 less of those servers if you got rid of all the scummy adtech infrastructure you're implementing and just focused on the core product. Unless your core product is ads and marketing data, in which case boil those oceans so that you'll be able to show mattress ads to someone that's 0.3% more likely to be influenced to buy a new mattress when they buy a pack of gum if they're running dark mode and have a Mac but use an Android phone and are located within 217 miles of Arkabutla, but only if it's raining.
I don't see colocation charges in there, unless you were planning on running your 'server' out of your house on a residential internet connection (that probably has restrictions on acceptable use.)
By the time you spec out a real server (redundant power, higher quality components than Newegg stuff), rent some space in a rack, pay for bandwidth (50Mbps is going to be about it without paying a premium), you're going to be looking at $5000 + $300/m. All that effort, whereas you could spin up something in the cloud for a bit more per month.
This does flip quickly, however. Once you get into the high 5-figure monthly spend, running your own hardware makes sense again. DHH's blog posts on 'Leaving the Cloud' are a great read.
If your stuff can fit on a single server at home, and are comfortable managing it, by all means do! It’s definitely way WAY cheaper, and if budget matters, that’s great. Nothing wrong with that IMHO. Obviously you can’t 100x overnight, but that’s realistically not gonna happen. And if it does, then you can start to migrate, which probably won’t be that hard, because it’s just impossible to make a single machine anywhere near as complicated as a cloud setup.
If you need an active component near your customers for low latency responses, the cloud makes it very cheap to deploy tiny VMs or small containers all over the place. It’s trivial to template this out, scale up and down for follow-the-sun or to account for local traffic spikes.
If you need it, you need it, and nothing else meets this need except perhaps some CDNs with “edge compute” capabilities — however those are quite limited.
Azure has its issues, but this kind of extreme take is hardly useful. Any large-scale cloud provider has problems—AWS had its fair share of major outages too.
A couple of years back, I was working at Mojang (makers of Minecraft).
We got purchased by Microsoft, which of course meant we had to at least try to migrate away from AWS to Azure. On the surface, it made sense: our AWS bill was pretty steep, iirc into the 6 figures monthly, we could have Azure for free*.
Fast forward about a year, and an uncountable amount of hours spent by both my team, and Azure solutions specialists, kindly lent to us by the Azure org itself, we all agreed the six figure bill to one of corporate daddy's largest competitors would have to stay!
I've written off Azure as a viable cloud provider since then. I've always thought I would have to revaluate that stance sooner or later. Wouldn't be the first time I was wrong!
reply