
Azure is experiencing DNS issues in all regions - HEHENE
http:&#x2F;&#x2F;azure.microsoft.com&#x2F;en-us&#x2F;status&#x2F;<p>Looks to be a global outage across Americas, Europe, APAC, and Africa. Office 365 is still up for us, but colleagues are saying theirs is down as well.
======
lwansbrough
Gitlab is also currently reporting issues with GCE. I'm unable to push to
their network right now.

Looks like all three major cloud platforms are experiencing problems right
now. Could be internet related?

[https://downdetector.com/status/windows-
azure](https://downdetector.com/status/windows-azure)

[https://downdetector.com/status/google-
cloud](https://downdetector.com/status/google-cloud)

[https://downdetector.com/status/aws-amazon-web-
services](https://downdetector.com/status/aws-amazon-web-services)

------
cheeze
Man... Azure seems to be an order of magnitude worse than AWS and GCP when it
comes to reliability.

Seems like they have tons of global dependencies within their services which
cause these cascading failures rather often... Seems like only a few months
ago we were reading about a global outage that affected auth?

~~~
pojzon
When it comes to outages, just recent Azure downtimes broke our SLA 5 times
over for the whole year..

Tho for w/e reason i cannot convince higher ups, that the switch was a stupid
idea. But hey! We got some credits to spend in Azure to compensate. (Too bad
we had to pay our clients with real cash)

~~~
salex89
Sometimes I'm not sure how Azure makes any money with the amount of credits
they give out.

I'm exaggerating, of course, but from my POV Azure looks subsidised.

~~~
StudentStuff
Azure is subsidized by Microsoft so they retain customers and mindshare, they
also seem to have a mix of talent working for them. In some areas like
reporting dashboards and such, Azure is head and shoulders better than AWS and
GCP, but few if any of their large clients use these features (missing the
primary value you'd get from using Azure).

Our interaction with the GCP sales team has repeatedly been aggressive US
based sales agents, whereas Microsoft sent us to deal with (incompetent) sales
agents out of India after promising us significant startup credits when we
crossed paths with them at a conference near Seattle.

------
llamataboot
Azure has been magnitudes more expensive and magnitudes more unreliable for
the one of our clients that demands we use it to host their stuff than either
AWS, Digital Ocean, or Heroku.

Honestly, I can't think of a single reason why I would recommend them over
anyone else at this point, no matter what your hosting or storage or computing
needs were. Can't see a single area where they are better than the
competition.

~~~
dsyko
Azure does have the only HIPAA compliant media transcoding service (as far as
I know). The API is terrible but it works.

~~~
cobookman
Might I ask the use case. Didn't even realize that HIPAA compliant transcoding
was a thing

~~~
ailideex
I am guessing more than compliant it is just some corporate captive market
thing where there is only one "compliant" transcoder

------
ed_elliott_asc
I’m at a complete loss how in 2010’s they can design build and implement a
cloud provider, from the ground up that has so many “global outages”

This isn’t “legacy” code that has migrated from cobol to vb to c#, this is
modern code and to suffer this bs time and time again is unforgivable

~~~
jeandejean
That's actually exactly the reason why it's unstable: clean code from scratch
that doesn't cover all the edge cases yet.

We SE keep whining about legacy code but forget that they made it to legacy
for a reason...

------
darkhorn
This is not new. DNS queries didn't respond to Outlook (Hotmail) servers from
within Azure. Thus the application I was responsible was unable to send emails
via SMTP. And this issue appeared few times in the last few years.

Our customer was bashing us if they were unable to use our product for several
minutes but when Azure was down for a day there were no complaints to
Microsoft. (they were hosting our product at Azure)

I don't understand their love of Microsoft. I guess because they have
Microsoft certificates like generals have.

------
AcerbicZero
I wonder if this DNS outage will cause data lose like the last one....

The entire "heres free ELA credits for Azure, please please Mr Sr Director/CIO
use Azure" seems to be working, but then they go and do stuff like this.

------
danjc
Yep, we've just had one of our environments alert us. Services totally
unresponsive - even the Azure portal was unresponsive. Affecting services we
run in the US and Europe.

------
ransom1538
Ok, many of these cloud platforms charge _more_ if you have things duplicated
across regions/zones. EG. AWS has multiAZ. Why pay for this? When AWS has a
major issue - all zones/regions are fucked in some way.

~~~
Tehnix
That’s actually a point that AWS tried to hit on during their last re:Invent
conference. When their competitors say multiple availability zones, read the
fine print, and you’ll see that it is far from being as resilient as AWS.
Sometimes other cloud providers have even located the two AZs in the same
building....

AWS AZs on the other hand are always geographically separated, and they even
take into consideration the landscape to see if they need to be further away
(earthquakes etc).

It’s staggering how much of a lead AWS has on others when in comes to their
AZs and global network infrastructure.

------
twhb
Relevant: [https://blog.serverfault.com/2017/01/09/surviving-the-
next-d...](https://blog.serverfault.com/2017/01/09/surviving-the-next-dns-
attack/)

------
hbcondo714
This would explain why my app kept throwing this exception when attempting to
call an Azure SQL instance:

    
    
      System.ComponentModel.Win32Exception: No such host is known

~~~
m3h
Same here. Just keeping the uptime good on our stack is challenging enough.
Now I have to add Azure's downtime to mine.

------
erikpt-work
Looks like the core problem today was a DNS zone delegation issue during a
migration off of legacy DNS servers. I'm not sure how this type of issue can
easily be segmented by region due to the way the DNS service is designed and
fundamentally zone replication works.

[https://azure.microsoft.com/en-
us/status/history/](https://azure.microsoft.com/en-us/status/history/)

~~~
not_kurt_godel
Yeah, DNS is pretty much an unavoidable single point of failure. Gotta do the
best you can to keep global touches to it as light as possible.

------
0_gravitas
Forgive my ignorance, but how do things like this happen at such a massive
scale? I would understand maybe one big area shutting down because of some
sort if internet network issue, but then there should be some redundancies in
place no? I don't get how it just all goes down across the globe, is there one
master-computer that has just gone down, taking everything with it?

~~~
fprog
The saying goes, "complex systems fail in complex ways". Check out some of the
cloud provider postmortems here for a few fascinating and detailed examples:
[https://github.com/danluu/post-mortems](https://github.com/danluu/post-
mortems)

You could say certain failures only occur and cascade under Special
Circumstances. :)

------
runnerr0
One would think after that 2016 Dyn outage people would strongly consider
having an alternate DNS provider....

~~~
jdwithit
First of all, yes, people should be using multiple DNS providers for services
that require very high uptime. Amazon.com itself notably uses Dyn and UltraDNS
as its nameservers.

That said, I'm not sure this would have helped in this case. It seems like
some or all of the problem was _internal_ Azure zones were failing to resolve.
Which no third party DNS provider would be able to mitigate.

------
hjk05
Tons of people arguing one way or another here. Does anyone have actual
statistiks on reliability og different cloud providers across different zones?

------
1f60c
Dupe(?):
[https://news.ycombinator.com/item?id=19812578](https://news.ycombinator.com/item?id=19812578)

------
dsfyu404ed
My company is working on a project to move a customer web portal to Azure.
This scenario was brought up and dismissed even though when events like this
happen many of our customers would certainly come to our web portal as part of
assessing the impact. I don't expect this to impact the project though. The
decisions have already been made.

------
zie
All the big clouds regularly have outages.

~~~
outworlder
In _all_ regions? No.

~~~
deathanatos
It's DNS, so it is somewhat inherently global. Route53 isn't region specific
either, so I could see an issue with that having a global effect, too.

~~~
y0y
DNS is also inherently distributed. This should make it resilient to all of
the most common outage scenarios, and is likely why AWS offers a 100% uptime
SLA for Route 53.

I'll be interested in the post-mortem from Azure on this one.

~~~
deathanatos
> _likely why AWS offers a 100% uptime SLA for Route 53_

Well, that's interesting. We occasionally see getaddrinfo() calls fail
claiming domains that we _know_ exist at the failure time (b/c the records are
completely static) don't exist. (We've not got a reproducible case for this
yet, and it's incredibly rare for any given VM/service. But across our fleet,
it crops up fairly regularly.)

~~~
cthalupa
> We occasionally see getaddrinfo() calls fail claiming domains that we know
> exist at the failure time (b/c the records are completely static) don't
> exist.

That could be whatever resolvers you're hitting failing rather than an issue
with Route 53 authoritative nameservers, though. The resolving DNS servers in
EC2 are not actually part of Route 53, for example.

~~~
deathanatos
I'd think that would correspond to EAI_AGAIN or EAI_FAIL, whereas I'm pretty
sure we're getting a EAI_NONAME.

------
bdibs
Could this be causing Slack’s downtime?

~~~
lelf
Slack uses AWS.

~~~
tylerhou
AWS was also down for a bit

~~~
cthalupa
What region/service? None of my customers called me complaining, so this seems
a little unlikely :D

~~~
tylerhou
[https://downdetector.com/status/aws-amazon-web-
services](https://downdetector.com/status/aws-amazon-web-services)

Around the same time as Microsoft. So it might have been a regional internet
outage.

Here is screenshot for posterity:
[https://i.imgur.com/dHnD6BO.png](https://i.imgur.com/dHnD6BO.png)

Also, see the top comment on the post:
[https://news.ycombinator.com/item?id=19814181](https://news.ycombinator.com/item?id=19814181)

------
nostrademons
I wonder if this is why www.cdc.gov is (was? appears to be back) offline.

~~~
oldmanhorton
As far as the status page was concerned, this didn't impact Azure Government

------
dRNAcro
Then someone or something is mining-a-like do congestion.

