
Netflix is Down - bavidar
http://gigaom.com/video/netflix-down-xmas-eve/
======
toomuchtodo
You'd think Netflix would learn by now to move back to their own gear.

EDIT: Downvote away; its practically dogma on HN to use AWS. How much downtime
are people willing to tolerate for a "superior" technology? Sure, Amazon AWS
has some great ideas and tech, but you might as well give up if your business
depends on EBS in us-east-1 _at all_

~~~
jrockway
The gear isn't the problem: the dependency on a single data center is the
problem. It requires _a lot_ of software engineering effort to maintain a
service that works when a data center suddenly goes away. To be fault-
tolerant, Netflix has to do this engineering regardless of whether or not they
own the servers. But if they use Amazon, they don't have to actually fix the
_servers_ when they break, freeing up engineering resources to fix the
software.

(Why doesn't Amazon offer transparent replication? Because the price of
replication is unbelievably high and most people can't believe how high it is.
If you want to write to 12 replicas around the world, budget five seconds for
your transaction to complete. Compare this to a MySQL database on an SSD that
can do millions of writes on the same entity group in the same amount of time.

This doesn't even include the cost: 12x the storage cost, and 12x the
bandwidth cost for data going from your frontend to all of the backends.)

~~~
moe
You're blowing this a little out of proportion.

Replicating a few petabytes of _static videos_ is not rocket science nor cost
prohibitive for a company the size of Netflix. Nor is engineering a system
that can withstand a datacenter outage. Especially one as trivial as Netflix
which is largely read-only. Thousands of systems of higher complexity are
engineered to that standard, many of them are much larger than netflix.

You rarely hear of Google outages, or iTunes, or Youtube, or [insert six dozen
other popular brands], do you? Yes it can happen to the best of them, but the
EC2 outages are really piling up lately.

~~~
pixl97
_You rarely hear of Google outages_

[http://techcrunch.com/2012/12/10/gmail-experiences-a-
widespr...](http://techcrunch.com/2012/12/10/gmail-experiences-a-widespread-
outage-most-users-affected/)

 _,or iTunes_

[http://appleinsider.com/articles/12/11/19/itunes-match-
down-...](http://appleinsider.com/articles/12/11/19/itunes-match-down-as-
apples-icloud-issues-continue)

 _,or Youtube_

[http://abcnews.go.com/blogs/technology/2012/10/youtube-
goes-...](http://abcnews.go.com/blogs/technology/2012/10/youtube-goes-down-
not-googles-best-day/)

Everyone goes down, but EC2 has been buggy as hell from what I can tell.

And before everyone says Netflix should `just` move or add more bandwidth...

[http://www.nbcnews.com/technology/technolog/netflix-
uses-32-...](http://www.nbcnews.com/technology/technolog/netflix-
uses-32-7-percent-internet-bandwidth-119517)

They represent a huuuge amount of bandwidth usage!

~~~
toomuchtodo
So work with eyeball networks and move your gear to the edge; dumping it all
in AWS and hoping for the best is failing (at best). Do all the
auth/billing/recommendations/transcoding in AWS, and push all the video
content out to boxes at the ISPs.

------
smoyer
Soon it will be news when the N. Virginia services are actually up! I don't
want to be too hard on Amazon because what they've built is pretty amazing,
but I really have to wonder what's different about the N. Virginia site ...
And why they're not having similar failures at the other sites.

~~~
campnic
Based on some rough second hand estimation [1] it is an order of magnitude
larger then all other AZs and is about as big as them all combined.

[1]: [https://huanliu.wordpress.com/2012/03/13/amazon-data-
center-...](https://huanliu.wordpress.com/2012/03/13/amazon-data-center-size/)

~~~
cpeterso
What is an "AZ"? Availability Zone? "AZ" is not a very Google-friendly
acronym. <:)

~~~
TkTech
You guessed correct!

------
cowkingdeluxe
Netflix is down because Amazon is down, so we had to watch shows on Amazon
(prime) instead. Funny how that works!

~~~
sjs382
I had the same experience. When watching movies with my SO tonight, I
explained that Netflix uses AWS and that AWS was experiencing problems. We
started watching a movie on Amazon prime and she asked "So I guess Amazon
Video isn't using AWS?"

~~~
antidoh
I tried to watch a movie rented from Amazon over Roku last night (not Prime)
and it was unwatchable. It would get stuck on some scene for seconds to
minutes, occasionally letting loose a bit of audio.

The movie was Inception. Maybe there's some irony in there.

------
criswell
Someone's Christmas Eve just went south. Godspeed.

~~~
zrail
Seriously. Instead of getting angry at Amazon I'm going to wish them a happy
Christmas Eve and hope their collective pagers go quiet soon.

------
chuhnk
<http://status.aws.amazon.com/>

Amazon CloudSearch (N. Virginia) - CloudSearch issues

"6:00 PM PST We are continuing to investigate the issue. Domain creation and
indexing operations continue to be unavailable. Changes to existing domains
may have severe delays in being processed."

Amazon Elastic Compute Cloud (N. Virginia) - Elastic Load Balancer issues

"5:49 PM PST We continue to work on resolving issues with the Elastic Load
Balancing Service in the US-EAST-1 region. Traffic for some ELBs are currently
experiencing significant levels of traffic loss."

------
toddh
This really sucks for the engineers. Not a fun way to spend xmas at all.

~~~
arkonaut
If you're netflix, I'm sure you have staff/engineers at the office 24/7 during
this time of year specifically. At least, I would hope they did.

------
gurch101
Having been in a similar position in the past, I actually feel bad for all the
people at amazon and netflix that'll need to work late tonight...

~~~
jimwalsh
They are getting OT/Holiday pay I'm sure. So they will be ok.

~~~
saraid216
Compensation isn't actually the point here...

------
jongold
I hope y'all have Home Alone on DVD…

------
yoda_sl
The net says it is mostly on the East coast but I am in California just a few
miles from Netflix HQ and it is down for me on my iPad. I should try on the
Apple TV since they mentioned that it is working on other devices. I will be
curious to hear the post mortem of the outage from both AWS and Netflix. The
timing I am sure doesn't help with Christmas Eve around the corner and
probably many engineers not reachable quickly.

~~~
protomyth
Apple TV in St. Paul, MN failed so I don't think the east coast thing is true.
Parents are not happy campers. I get the feeling all iOS devices are include.

~~~
amartya916
If Netflix can deactivate clients based on device size; that'd be pretty neat.
People on tablets–Netflix on my Nexus 7 is out–would usually be watching stuff
alone (kids may huddle around an iPad though) and for Christmas, a movie on a
large screen is probably more important to keep going.

Makes me think that apps should consider using Push notifications to tell
consumers about issues; e.g. instead of a cryptic error number; say something
like "Netflix servers are overloaded at the moment, NetFlix may not work on
your mobile device".

Not sure if that will piss off people more, but something like that instead of
an error number would probably be better.

------
pbiggar
Interesting that Heroku rode this one out. Any Heroku engineers want to tell
us your LB setup? Do you use ELB?

~~~
zrail
Heroku did _not_ ride this one out. I have an app right now behind a failed
ELB.

<https://status.heroku.com/incidents/479>

Edit: Make that two apps.

~~~
pbiggar
Oh right. I just checked Heroku.com.

~~~
alxndr
heroku.com is up for me (near NYC) as are the Heroku-hosted sites we have at
work.

~~~
zrail
It's not every app for me either. It appears to be a subset of apps that use
the ssl:endpoint add-on to implement HTTPS.

~~~
malyk
And its not all sslenpoint sites either. We've been up through the whole
thing. Which is critical seeing as we are a gifting company that provides an
easy last minute gift!

~~~
alxndr
Great timing!

------
antidoh
Amazon movie rental over Roku is cutting in and out, and I can't watch Netflix
(over Roku) at all. Colorado.

------
arkem
A pedantic note: In the title "EVE" should capitalized as "Eve"

------
muyuu
Are they insured against a contingency of this calibre? Does Amazon
provide/offer such an insurance?

~~~
zrail
Amazon EC2 has a Service Level Agreement (SLA)[1] that guarantees 99.95%
uptime in a rolling 365 day period, and provides for service credits. Of
course, a huge customer like Netflix can and probably did negotiate their own
SLA terms.

[1]: <http://aws.amazon.com/ec2-sla/>

~~~
muyuu
I was wondering if they'd pay damages and how would that be calculated.

I don't think free service for a while would cut it with a contingency of this
sort. Maybe someone has first-hand information of their contract (or any other
big player) and can answer to this publicly.

~~~
moe
I don't know the Netflix contract but I've been involved with a few big
(telco) SLAs. The penalties are usually calculated with a points-system and a
multiplier that raises according to the duration/impact of an outage.

E.g. the first 15 minutes of an outage may cost 1 point per minute, 15-60
minutes 2 points, and so on. You also have multipliers for the severity
(partial or full outage, customer impact, affected countries, etc.), time-of-
day, and so on.

Collected points may then later be traded in for dollars or a nice lawsuit.

Corporate lawyers love to go nuts on these things, an enterprise SLA can
easily span a hundred pages of legalese.

------
bavidar
you would think they would have learned to build redundancy into these types
of systems.

~~~
yoda_sl
Usually Netflix do well when AWS has an outage... At least in the past that
was the case, so it really pick my curiosity on what is the difference with
this outage vs the others. In addition it will be interesting to know if the
timing of it -aka just before XMas- did affect the responsiveness of the
engineers/team for both Amazon and Netflix.

~~~
zrail
The difference is that elastic load balancers are the things that actually
implement the redundancy in AWS, and if one goes down and another can't be
immediately started in it's place, no traffic will be getting to the instances
that sit behind it.

I believe AWS has been working on a solution to this, but thus far haven't
released anything.

------
davidf18
Clearly the work of "The Grinch who Stole Christmas...."

------
sigzero
I just got finished watching something on Netflix?

~~~
Raphael
You don't seem to sure of yourself.

------
lucian303
Cloud with a single point of failure. Merry XMas!

------
jpdevereaux
How is anyone supposed to enjoy the holidays with their families if they
aren't consuming media?

~~~
rhizome
1) some families like to watch stuff together

2) not all netflix subscribers have families

~~~
jpdevereaux
Fair enough. I for one enjoy expressing anti-television sentiments, with and
without company.

~~~
staunch
[http://www.theonion.com/articles/area-man-constantly-
mention...](http://www.theonion.com/articles/area-man-constantly-mentioning-
he-doesnt-own-a-tel,429/)

------
31reasons
What is the purpose of Xmas Eve if you spend it watching TV just like any
other eve. Just saying.

~~~
antidoh
For many of us, watching a movie on TV is anything but just another evening.

Everyone is different, and no one needs to justify their time to anyone.

