
Verizon has no excuse for its planned cloud outage - tshtf
http://www.infoworld.com/article/2865815/cloud-computing/verizon-has-no-excuse-for-planned-cloud-outage.html
======
blfr
Why does Verizon have an enterprise cloud? And why is anyone using it?

Is it somehow related to their main business? Do the servers run deep within
their network allowing quicker access for mobile users? What is their unique
selling proposition?

They have a whole "why Verizon" section[1]. I don't feel like it answers this
question.

[1] [http://cloud.verizon.com/why-verizon](http://cloud.verizon.com/why-
verizon)

~~~
ben1040
Verizon Enterprise is the service lineup that they inherited from acquiring
MCI some years ago. MCI, of course, having been kind of a Big Deal as it
relates to the internet way back when.

[http://en.wikipedia.org/wiki/Verizon_Enterprise_Solutions](http://en.wikipedia.org/wiki/Verizon_Enterprise_Solutions)

------
TheDong
This articles misses the point of the cloud. The argument for "live
migrations" and lambasting of AWS for bouncing servers for the Xen exploit are
both completely misguided.

The point of the Cloud is not that a server never restarts or has downtime.
It's that your app runs on many servers in different AZs and regions such that
any one server failing is not going to have a real impact and can be easily
and quickly replaced.

AWS, when bouncing their servers, did them one az and one region at a time.
Because of that, if you ran across multiple AZs you would get to test your
failure scenarios, but not be down.

The author's focus on other cloud providers having restarts makes it sound
like he's saying "Verizon is bad, but in bad company". There is no comparison
to be made between taking down your whole cloud, and losing a server here and
there.

~~~
t0mas88
Spot on. A random server could go down at any moment, either by a hardware
failure or because your cloud provider decides it needs maintenance. The
latter actually gives you an up-front warning, the former is just "bang". So
any well designed system independent of running on a cloud or on your own
hardware should be able to deal with this, since it's the reality of servers:
sometimes one breaks.

All of the bigger Clouds (Amazon, Rackspace, SoftLayer) handle their security
related restarts this way, one area/region/DC at a time carefully making sure
that well designed systems don't notice downtime.

What Verizon does is a totally different thing and something most "important
but not life threatening" systems do not design for: Taking down the whole
cloud with all servers at the same time.

~~~
devonkim
What hasn't really been said well is that there's constant incidents that are
going on in large cloud providers and you don't need to have a maintenance
period like this to experience failures. I work in a place where people are
still used to individual machines being up constantly by IP alone and aren't
practicing high availability architectures in their cloud environments. I had
to explain to an architect that VMs can simply just hiccup in AWS or any other
provider and that just relaunching the instance is fine. The idea of random
failures may be foreign, but I was certainly a bit surprised that someone with
such rudimentary understanding of infrastrucure wouldn't be familiar with the
idea of "have you tried turning it off and on again?"

With that said, I have zero hope that most enterprises will be able to get a
decently solid failover strategy for their cloud applications because an awful
lot of them are barely able to get anything to even one cloud provider setup
at the most basic layout. Half the folks I'm aware of don't even have cross AZ
redundancy let alone cross region failover. When people are turning off 3-4
VMs to save money, you're in no position to think about cross provider load
balancing.

Most of the enterprises I'm familiar with would throw Verizon more and more
money in the blind hope that it'll make their service more reliable. They'll
do this to an incredible dollar amount partly because it'll still be cheaper
than their old, traditional in house IT that cost multiple orders magnitude
more but got way worse availability. Stuff like this scheduled maintenance
shows how foolish it is to believe that someone will magically become ultra
reliable by you virtue of you paying them to be.

------
specialk
Two whole days of zero service. In this one moment I have lost all
expectations of good service from Verizon's cloud.

It doesn't look like customers were given much warning either. This story was
originally published on the 6th of Jan. Could you imagine trying to find
alternate hosting setup by the weekend if you have any kind of availability
expectations? It seems like madness to me. Even if you did move yourself to
another host to cover this 48 hours of downtime how likely are you to move the
majority of your business over to AWS, Google Cloud, Azure etc.

The lack of notice on this seems to be a bigger issue to me than the fact that
Verizon is taking their whole cloud out of service for 48 straight hours.

~~~
iancarroll
At the start of Google Cloud they had planned outages lasting _weeks_. Now,
you could (with a bit of hassle) move your instance to another region, but it
was very annoying and off-putting.

~~~
obstinate
Really? Is there a link about this? I can't find anything.

~~~
iancarroll
I can't find a source, this is only what I remember. I was an early beta
tester of GCE and I think these were scheduled shortly after public launch.

~~~
obstinate
According to the other person, it is that individual zones may be offline for
a week. That makes sense. A whole cloud being offline for a week doesn't make
sense, but yea, sometimes datacenters need new network fabrics.

------
lwhalen
The big print giveth, and the small print taketh away. When the reality fails
to live up to the marketing and hype, folks are naturally going to be upset.
However, those of us who actually work in IT for a living expect downtime and
architect for it. In a perfect world, Verizon's customer reaction should have
been one of mild annoyance - "Oh well, guess we'll fail over to our secondary
for that weekend". Anyone who drank the 'zero downtime' kool-aid will
hopefully treat this as a (deserved) wake-up call that marketing does not
excuse poor engineering on their part. Don't like the fact that your
'immortal' cloud provider goes down in planned (and sometimes unplanned) ways?
Buy your own iron and do it yourself. It's the only way you can be completely
100% certain it's done right.

~~~
themartorana
Two hours is sufferable. Two days is insane. Not everyone has the engineering
talent or time to have auto-deploy multi-cloud failover.

Not to mention that if you buy in to certain services, say, any AWS
architecture beyond EC2, failover to another provider becomes a lot closer to
impractical if not impossible.

Two hours is sufferable...

~~~
lwhalen
This is where good sysadmins, and teams thereof, earn their high salaries. You
shouldn't _have_ to suffer through a two-day outage, but it happens. A good
sysadmin will insist on multi-provider hosting, be able to advise you on how
not to get locked into $vendor's precious snowflake cloud implementation, and
myriad other things.

~~~
falcolas
Yup. Hurricane Sandy should have been enough to remind folks of this.

Natural (and unnatural) disasters happen, and you have to be prepared for this
contingency. There are many tools out there which can help with this, but a
competent sysadmin will help a hundredfold ensure that your business continues
when the unthinkable happens.

Heck, only today one of Amazon's new datacenters on the east coast had a 3
alarm fire[1]. Didn't end up having any instances fail, but we did notice some
of our services have problems that coincided with the fire, and we were ready
to fail over the moment instances started dropping.

[1] [http://money.cnn.com/2015/01/09/technology/amazon-data-
cente...](http://money.cnn.com/2015/01/09/technology/amazon-data-center-
fire/index.html)

------
res0nat0r
> Although I never expect 100 percent uptime, planned outages aren't needed if
> the cloud platforms are designed correctly. It's quite possible to do live
> migrations without a server reset these days, yet the cloud providers seem
> to be missing the boat here.

It is slightly more complicated than that.

> There should be no cloud outages, ever. Got that?

Really? That's not how the real world works. This "article" is pretty crap.

~~~
specialk
Yeah I agree the hyperbolic sentences are over the top. The only way for
anyone to reliably get 100% uptime is to use two or three cloud providers or
their own dedicated boxes somewhere. Even AWS don't provide 100% uptime
guarantees. Though I think the author's underlying point that good design
should lead to fewer/smaller planned outages. However, his statements are over
the top in the extreme.

~~~
ctdonath
Two _days_ down, by a world-scale provider, is also over the top.

Nobody expects 100% uptime. They also don't expect 100% down for _days_.

~~~
res0nat0r
Understood. His main point is that no cloud provider should ever have any
downtime which is idiotic and no reputable company would ever promise zero
downtime.

