Hacker News new | past | comments | ask | show | jobs | submit login
Azure was having DNS issues (azure.com)
98 points by ddb on April 1, 2021 | hide | past | favorite | 48 comments



Alternate status page, of course showing all green across the board in typical Azure fashion. https://status2.azure.com/


That link gives me a DNS error. Edit: Working now as of 6:08 PM EDT


Does this have to be posted every time some cloud service has issues?

We, as in people in this forum, know that status pages are worthless. They’re tools with the explicit purpose of reducing the burden on tier one customer support. That’s it. They are not a public monitoring platform.


As an alternate example, Github tends to have a pretty good status page in my experience. It'll usually be up-to-date within minutes of people chatting about issues on work Slack and gets updated at a regular cadence with details.

AWS on the other hand... We usually just reach out to our TAMs and say "Hey, our application monitoring is showing tons of errors interacting with service X--can you check your super secret internal dashboards and see what the deal is?" It's nice to at least have a "Yeah the service is completely hosed and in a bad place" or a "Yeah, some changes went wrong and are being rolled back". The former usually requires some sort of mitigation while the latter can largely be ignored


Having a fully up to date status page is what prevents useless repetitive cases during degradation or an outage. If I'm having issues but you show all green, I'm submitting a case.


For large incumbents, publishing a complete and accurate status page might not only be a recipe for bad press, but also lawsuits. There's significant downside and not much upside to telling the whole truth. It's entirely unsurprising that cloud providers like AWS or Azure would play definitional games w/ what constitutes an outage. Rolling 5m, 1% outage across your entire customer base? That's just a hiccup! If your staff has to field more confused questions and complaints, it's a relatively small price to pay.


I was on support until a few minutes ago (SQL DB, not general Azure), and the status page was ironically my first indication something was up since said status page was having DNS issues.


Their health page was down too. This was the first place were i saw something.

A colleague saw it on twitter.

So yes, it's useful


That page says Azure DNS is all good. Seems wrong.


Github is also having problems. https://www.githubstatus.com/

Coincidence, or have they gotten around to moving some of their infrastructure to Azure since the MS acquisition?


Our GitLab CI is failing due to https://mcr.microsoft.com/v2/ being down.


Afaik Actions (maybe Packages, too?) was always built on Azure. I think Github core is on bare-metal

I'm guessing they probably get the elasticity of cloud while paying the wholesale or at-cost of the infrastructure (surely they get some discounts over the advertised price, at least)


My guess is the latter.


https://azure.microsoft.com/en-us/updates/azuredns100sla/?cd...

> Azure DNS is now being offered at a 100% availability SLA that's backed by our diverse, geo-redundant DNS infrastructure.

> With this update, Azure DNS guarantees that valid DNS requests will receive a response from at least one name server 100% of the time. For details, see the SLA definition.

This hasn't aged well


If I understand this correctly, everything unavailable is eligible for a 25% credit, and if the downtime exceeds ~4 hours then it's a free month.

Neat!


free month of azure DNS or free month of everything that you can't reach because DNS is down? ;)


Whichever they prefer


Wow - I can't even install dotnet on a linux machine because the packages repo is down.


Noticeable spikes across the board ...

https://downdetector.com/


Coinbase is ~completely down (?)

I'm now seeing all outbound DNS lookups from my Heroku instances failing.

What is going on?


I have seen nothing to indicate this is the case, but is this what a DDoS attack looks like?


Thank goodness I’m not on call!


How do you know? Together with low TTL it seems plausible. But I don't understand how you can know it this fast.


IcM is an internal tracking tool at Microsoft. It sounds like they looked at the tool and noticed this. I used to work there and use it as well.


^


Azure is not having a good year. Two major Active Directory outages, a major CosmosDB outage, and now this.


The sad part is that even though that feels like a lot given the time span, it doesn't feel that bad given it's Azure. These network issues have been plaguing us ever since we started using AD.


Apparently AD doesn’t scale. Who know. /s


Is that all between January and April?


Let's Encrypt is also down and can't issue certs


These problems are actually good in that more organizations - especially big ones - realize that reliability is not something that you can outsource to a single provider and the problem magically disappears. Literally any service you depend on, from DNS to email, should be using redundancy so that when your basket gets squashed, you still have some eggs. Neither Amazon nor Microsoft will tell you that because vendor lock-in is in their best interest. You need to take care of it yourself otherwise you are completely at their mercy.


https://twitter.com/AzureSupport/status/1377737333307437059

> We are aware of an issue affecting the Azure Portal and Azure services, please visit our alternate Status Page here https://status2.azure.com for more information and updates.


Which was also down until just a few moments ago. Seems like poor planning, but maybe it was just overloaded with requests


There have been some internal debates about setting up a secondary DNS in case Route53 somehow went down. My reasoning has thusfar been that if Route53 is down, there are probably other AWS services that we depend on that would also be down.

What do you guys think? Is secondary DNS in this case worth it?


We're moving towards dual DNS providers. We've been bit by DNS hosting failures too many times, and they're always painful because everything, including monitoring and control systems, end up dead or inaccessible. Not to mention the entire network being down is absurdly expensive if you're paying SLAs.


I wasn't able to get to https://status.azure.com/en-us/status either but now I can and everything shows "green" including Azure DNS


I'm seeing some errors in my applications, but most requests are still working somehow.


Perhaps the errors are only partially logged. Since it's a DNS issue ( and potential DDOS)


Can't get to the status page even. Seeing issues with Microsoft Teams also.


DNS responses for our app servers in Azure were failing and now they are taking ~4000ms. They have a really short TTL too, which really exacerbates the issue.


I haven't been able to get to the status page. I noticed the problem because I currently can't connect to Service Bus nor Storage.


Hm...I'm not able to access the status page, but some of my coworkers are (as of 5:35 PM EST).


portal.azure.com and status.azure.com are still down in NA Pacific.


They just cant seem to keep the lights on.


"Serverless"


It's always DNS!


I guess the Protocol Police is doing its first major raid.


Most unreliable cloud provider




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: