

A Closer Look At The Christmas Eve Outage - aaronbrethorst
http://techblog.netflix.com/2012/12/a-closer-look-at-christmas-eve-outage.html

======
jkat
Currently in a battle with management about whether to launch on AWS or not.
They want to, engineers don't. Engineers are largely driven by raw CPU
performance. Management, well, they seem to be thinking "no one ever gets
fired for picking..."

Anyways, when you read all the effort Netflix has put into their cloud
architecture (1) and the hiccups they (and others) have...I just don't know
what hope our small team of 5 has of success. It seems like, to succeed on the
cloud, you really need to build your app specifically for it (which we haven't
done!)

1 - [http://techblog.netflix.com/2010/12/5-lessons-weve-
learned-u...](http://techblog.netflix.com/2010/12/5-lessons-weve-learned-
using-aws.html)

~~~
raverbashing
"engineers don't. Engineers are largely driven by raw CPU performance"

There are several reasons to not use AWS. CPU is _not_ one of them. Especially
when you have the choice EC2 offers (though at a price)

But unless you're doing something _very, very_ CPU intensive (like doing heavy
math like CFD, integer programming, etc) this is irrelevant. My bet is that
you aren't.

~~~
jkat
I know I was vague, but we really are CPU-bound. We do some fairly heavy set
and bitwise operations. Our smallest unit of work is about 3x-5x slower on AWS
(while costing about 4x more than alternatives). This can't be further
parallelized without a rewrite. The problem is compounded by the negative
impact this has on concurrency - which _can_ be solved by adding more
machines, but that just makes AWS that much more expensive.

A different, less important system, does image manipulation. This is also very
CPU-sensitive.

~~~
raverbashing
Interesting

Maybe I need to rephrase what I said: AWS instances are very bad at CPU, but
the flexibility to add more instances and different instance types can
compensate for that

Yes, maybe you can try Linode or Rackspace to compare their CPU performance

------
ChuckMcM
Chaos Monkey fights back :-) I wonder if there is a way for the CM team to do
ELB outages. At some point you entertain the idea that Amazon goes offline but
probabilistically is that similar to a 9.0 magnitude quake _and_ a 30M tsunami
on the same day?

~~~
lucian303
Correct. Probabilistically, I'd say it's the same as long as that earthquake
is under the ocean. Yeah you'll get 30M tsunami with a 9.0mag earthquake (most
likely, I'm no geologist).

So yeah, AWS going down should be something _every_ company that runs its
services on AWS including providers like Heroku should take into account as
far as their architecture goes. It's not a matter of if, it's a matter of
when. You _will_ have downtime.

Period.

------
plasma
Why is CPU performance on EC2 so terrible compared to dedicated servers?

------
lucian303
"The Netflix Web site remained up throughout the incident, supporting sign up
of new customers and streaming to Macs and PCs, although at times with higher
latency and a likelihood of needing to retry. Over-all streaming playback via
Macs and PCs was only slightly reduced from normal levels."

This is simply false. I tried Netflix streaming on both my Macs and my Roku
and neither worked. The site may have been up, but the streaming was down for
Macs (and any computers in general I assume), not just TV boxes like Roku.

~~~
Soulsbane
<http://dvd.netflix.com/> is down right now and has been every time I've
checked today. "We're Sorry

The Netflix site is temporarily unavailable. Our engineers are working hard to
bring the site back up as quickly as possible."

Of course it's not a priority over the streaming portion I'm sure for obvious
reasons.

~~~
Bill_Dimm
It's up for me for at least a little while now, but I think it was down for me
all afternoon (Eastern timezone). This supposedly only impacted "some" of the
DVD customers. [http://www.bloomberg.com/news/2012-12-31/netflix-says-
some-c...](http://www.bloomberg.com/news/2012-12-31/netflix-says-some-
customers-can-t-access-dvd-portion-of-website.html?cmpid=yhoo)

