

Microsoft's Azure cloud down and out for 8 hours - pshken
http://www.theregister.co.uk/2012/02/29/windows_azure_outage/

======
powertower
This seems to be about some odd certificate issue, not the network, which
caused Microsoft to take access to its service management system down.

From the article itself:

> It later added that less than 3.8 per cent of hosted services had been
> affected.

If this was about Google or Apple, this submit would already have been
flagged, taken off the front page, and several accounts would have been hell-
banned.

</endjoke, but it's really true>

It will be interesting to see if Microsoft expands on this issue and if we can
learn something about it... Perhaps other datacenters are also vulnerable to
cert issues in their management systems.

~~~
stickfigure
Compare this to administering machines yourself: Your "admin console" goes
down every night for 8 hours while your sysadmin is asleep.

~~~
ithkuil
It's not unusual for my sysadmin to be waken up in the middle of the night

~~~
stickfigure
Lucky him.

(or her)

~~~
danielsoneg
In my rather limited experience as a sysadmin, I've found avoiding those calls
to be one of my best motivators: My systems work like they're supposed to
because I like my sleep.

Consider it a performance-based bonus.

------
bry
The most frustrating thing for me was the complete lack of any real
communication from Microsoft. For awhile, even their status dashboard was
down. I only found out about it after I got a PagerDuty alert and had to
search Twitter (other people complaining about it) to confirm.

We have an Azure CDN backed by a Compute Instance, and zero official notice
from Microsoft about this still. I've learned more about the problem from news
articles than the company that provides the service. Fortunately we haven't
finished migrating the rest of the site to Azure. No emails from them,
nothing. Not even a tweet on their official @WindowsAzure account.
Frustrating.

~~~
TeHCrAzY
They finally pushed something out via Twitter, about an hour ago:
<http://twitter.com/#!/WindowsAzure/status/174954154362548224>

------
d4nt
I guess this sort of thing was bound to happen at some point. As the article
says, all the major cloud platforms have had outages at some point.

As someone who is actively considering building products using this platform
I'm keen to see how well they manage this issue. How do they communicate
during the outage, how open about what went wrong are they afterwards, do they
learn from it, and so on. I particularly like the efforts Amazon have gone to
in the past to show they are learning from these issues (see
<http://aws.amazon.com/message/65648/>) I'm hopeful that Microsoft will show
the same level of openness.

------
chollida1
I feel bad when ever I hear about outages at companies like this.

I'm in charge of technology for a hedge fund with 15 people and I get freaked
out each time we roll out a new piece of software.

------
latchkey
Heroku Dyno's were down for a lot of clients for several hours on saturday.
One site that I use a lot, Intercom.io, was completely off the grid because of
this. I was wondering why there was no news about it here.

<https://status.heroku.com/incident/308>

~~~
andypants
I'm sure I read about the heroku downtime on hacker news. Maybe it got pushed
off the front page pretty quickly though.

~~~
krobertson
It was on the front page for a while, however it was the weekend, so likely
didn't garner as much attention. And was like 2 hours... not 8-9+ like Azure.

------
Tloewald
So is this a feb 29 bug? Problem occurred, it seems, at the advent of 2/29
GMT. Worst date handling ever?

~~~
Maxious
Apparently electronic payment systems from ATMs to merchant terminals to HMO
claim machines all went straight to march 1st. Hilarity ensued.

~~~
bouncing
Yeah, and I noticed my paycheck arrived a day early.

------
speedracr
As a non-tech observer, Azure actually struck me as an honest attempt by
Microsoft to add a compelling offer to the mix. However, their status page-
cum-website seem to be hosted on Azure itself, which is ridiculous in a
situation like this. (Almost like Twitter having a status page on tumblr.)
Worst of all, even www.twitter.com/windowsazure offers no comment at all so
far. Isn't this wiping out any credibility they might have built up with
developers? Is anyone affected?

Edit: if this truly is because of 2/29, I guess anyone signing up from now on
will get perfect service.

~~~
ot
> Edit: if this truly is because of 2/29, I guess anyone signing up from now
> on will get perfect service.

At least for the next 4 years

------
TeHCrAzY
Weird, we have a couple of respectable (80+ requests per second) services
running on azure in the SE asia zone, and I can't see any problems at all.

That said, we only rely on table storage and our instance count is mostly
static.

------
chubot
So just "service management" is down, but the apps themselves are up? If so
that's better than Heroku's recent downtime.

~~~
sriramk
Disclaimer: ex-Windows Azure person, no insider knowledge on this particular
outage

That's typically because the 'service management service' typically kicks in
when it needs to do things - allocate capacity, restart things when stuff goes
down, etc. By default, it isn't touching the running apps inside VMs. There is
no Windows Azure equivalent to Heroku's routing mesh to be taken down; the
requests go to the VMs directly via the various networking layers.

------
ahrens
We had problems today. We had the bad luck that one of our web roles crashed
during the time the admin interface was down. That meant we couldn't restart
it and neither could microsoft. We will be adding instances to the role to
avoid similar problems. Otherwise, we are very happy with Azure.

~~~
TeHCrAzY
Can you explain what your problem is? Azure will automatically restart
instances that are dead afaik (unless you are saying that the automatic
restarting was broken as well?).

------
dragosstancu
That sucks, we have a bunch of media stored with them and we're looking
forward to using the cloud for some .NET based intensive processing
(reporting).

A bit off topic: I've been trying for days to get Azure to work with .webm or
.ogv files. Maybe I'm using the wrong tool (CloudBerry). I want to be able to
deliver HTML5 video from the cloud but I'm without success for FF users.
Luckily, my video player features a Flash fallback which is awesome.

~~~
barranger
was the media stored in blob storage? From the article it looks like storage
accounts weren't effected, just the management api and ~4% of service
accounts.

------
securingsincity
Considering they are announcing the windows 8 consumer preview right now and
possibly other new cloud related features not good timing at all.

------
JonoW
Article is not clear whether Azure instances are down, or just the service
management tools? But either way, ouch!

------
noahhs
Today's outage took down our website, our computing grid, everything. And this
afternoon, when Microsoft said "the majority" of Azure clients were back up
and running, we were still in the dark. Dammit.

------
krmmalik
I wonder if this explains why i have been having problems with Siri? I know
that sounds far fetched but isnt Apple relying on Azure now?

~~~
jstclair
Don't know why you got downvoted - AFAIK Siri isn't tied to Azure (although it
could be using it for storage), but iCloud runs across both Azure and Amazon.

------
wonderercat
I think it's interesting that the #1 feature on their features page is labeled
"Always up. Always on."

------
cache77
Maybe they forgot to account for leap year in their programming :/

------
rmc
Not Itchy & Scratchy Land Europe!

------
dutchbrit
<http://www.windowsazure.com> isn't even loading here.

ping www.windowsazure.com PING wamktg-prod-db-001.cloudapp.net (65.52.64.144):
56 data bytes

Request timeout for icmp_seq 0

Request timeout for icmp_seq 1

Request timeout for icmp_seq 2

Request timeout for icmp_seq 3

Request timeout for icmp_seq 4

Request timeout for icmp_seq 5

~~~
NARKOZ
ping responses are disabled

~~~
tathagatadg
Makes sense, but at least www.windowsazure.com/en-us/support/service-
dashboard/ shouldn't time out ...

~~~
dutchbrit
Exactly.. Not sure why my comment was down voted - sure, maybe pinging is
disabled and that part of my comment wasn't valid. But still, isn't
responding.

------
bouncing
The bad news for Microsoft isn't that Azure is down, it's that only a tiny
number of people even noticed.

~~~
recoiledsnake
Less than 3.8% of users were affected, take your trolling else where.

~~~
huggyface
Is it really trolling? Honestly, who uses Azure? Like most MSDN subscribers I
did some mashing in it -- that still exists -- but did absolutely nothing real
in it. My sense is that very, very few did anything beyond prototyping in it.

And just to add some opinion, the reason I wouldn't even consider it is
Microsoft absolutely flippant ADD when it comes to online services. I have
zero faith that they won't just shut it down tomorrow.

~~~
cooldeal
A whole bunch of universities and government agencies.

~~~
huggyface
Such as? Microsoft's case studies for Azure are absolutely dismal.

I note the other comment mentions Apple, which was a pre-release beta _rumor_
based upon the IP that a beta iMessage lived at (since moved). It speaks
volumes, I think, when such a disproven pre-release claim is still held as the
example.

~~~
huggyface
-1? For real? "A whole bunch" is not a credible statement of fact.

