Again, not right now, but at some point this sort of outage becomes critical and will probably deserve more scrutiny.
I’m impressed by how reliable Netflix is, hopefully this just pushes them harder.
And maybe even prepares them for a less neutral internet.
Local stations regularly go off air. Sure, some stations outside the range of your antenna might work but effectively their just down. For national events they will fail over to the local stations and keep broadcasting, but if the content is not there they are just down.
_"We have so much we want to do in our area, so we're not trying to copy others, whether that's linear cable, there's lots of things we don't do. We don't do [live] news, we don't do [live] sports. But what we do do, we try to do really well."_ - Reed Hastings.
Arguably it could be used as a guaranteed self-marketing platform (in the intimate/most core sense - you can't watch if you don't have Netflix so people will just think of it as NetflixNews).
(Near) realtime would be important for emergencies, as rare as they are.
Yes it would be embarrassing and it would be an inconvenience, but nothing worse than what happened today.
Broadcasting TV and Netflix are a very different payment model though. Broadcasting get their money from adverts. So even a couple of minutes of outage could cost them tens of thousands in terms of compensating sponsors for lost ads (depending on when the outage is as not all ad slots are worth the same). Where as Netflix is a subscription model where an outage is an inconvenience to their customers but they're not directly losing money (aside engineers overtime fees etc). Obviously Netflix could potentially lose customers but that's not going to be at a high rate from a single outage like this.
This distinction is can change dynamic of how you build broadcasting infrastructure. eg redundancy equipment is typical in any high availability deployment but broadcasting will not only have redundant physical equipment in different physical locations, but will often also have a second set of redundant hardware in each location purchased from different suppliers and running different software just in case it is a software / hardware malfunction specific to the product. Whereas in internet streaming services the emphasis is more on standardising software stacks to aid scaling - which makes total sense in terms of cloud services but that does still give you a potential point of failure (eg poorly tested Puppet or Terraform code getting deployed to prod).
Local US stations might not have the same level of redundancy as their national counterparts, but like the distinction between Netflix and traditional broadcasting if a financial one, equally the difference between ad revenue for local vs national broadcasters would be massively different. Ultimately the more costly it is for your service to be off the air, the most you'd expect to invest into your infrastructure to ensure you don't have any such outages.
I hope not, that has echos of Yahoo. It's bad enough their streaming library is so thin, I'd rather see more content there than create their own new line of content.
Their content strategy (and Amazon's) seems to be evolving toward developing their own content library, much like HBO. In theory, it frees Netflix of over-dependence on networks and studios. In practice, we get crappy shows and standup specials.
My watch-list just keeps growing. I don’t know if I’ll ever be able to catch up.
Try a higher end streaming devices that wired to your router
Netflix has spent A LOT of time and money on their infrastructure, expounding their engineering views, proclaiming best practices, and how they use redundancy/sharding/insert best practices and major buzzwords in order to combat the chance of this happening.
I'm a huge fan of Netflix and their approaches to infrastructure, and I'm almost certain that they'll have some very interesting conclusions from this.
I mean, after all, the internet's true natural predator still lurks, its yellow CAT steel present across all of the nation...
I speak of course, of Backhoes.
Employees, CEOs and businesses in general like generating cash and/or getting paid. If an ad network being down hinders their ability to do so, then that seems pretty critical to me.
Sorry but I really have no sympathy for people that happily waste my time by showing me irrelevant content, trying to sell me garbage or flood my mailbox at every opportunity (the only reason my mailbox is clean is because I am allergic to any web form that asks for my email).
I don't have anything against Google per-se, by the way. I'd be happy to pay for their services should they allow me to do so.
And that we would be unlikely to have such things in a pay for everything model?
That's actually a commonly-heard phrase in the US around Superbowl time.
Are you referring to the 17 minutes per hour of actual gameplay, the 21 minutes of rambling commentary/sports celebrity gossip, or the 22 minutes of ad spots?
I think there was a massive marketing campaign to get people to pay attention to superbowl ads claiming they are funny.
They've just turned into ads in recent years.
My company ended up having an equipment failure that took out both production servers and out internal support system at the same time. Lots of pale faces that day. What has two thumbs and bureaucratic bullshit that put runbooks and deployment tools on production hardware? This guy.
My bet is whatever cause this problem, someone got a raise last year for doing it.
If Chaos Monkey had been responsible for setting off a global outage, I could imagine business leaders getting cold feet about using a tool like this. In traditional companies, anyways, they'd never have seen the benefit of it and after only hearing the costs, they'd probably be livid that a widespread outage had been caused by something like this.
> Chaos Gorilla is similar to Chaos Monkey, but simulates an outage of an entire Amazon availability zone.
Widespread outage is exactly what something like chaos monkey is intended to help prevent. Even if this were chaos monkey induced (seems very unlikely), they've gained far more stability by having it than they would have lost in a single outage
One could make an Arrested Development episode where George Sr. and Lucille hear about this concept and decide to apply it to their children.
(Edit: the Arrested Development <-> Netflix tie in didn't even occur to me until after writing the comment.)
Big companies do though they call then interns.
Resilient systems take longer to build, and thus are more costly. It is not always wise to spend that much resources on this for, say, a startup.
After a startup is crushing it, then maybe they can start rewriting any brittle monolithic systems into something more resilient.
Hopefully it wasn't a Hurricane's Butterfly kind of deal...
Similar idea, you're in a hurricane but it may have been caused by an insignificant butterfly far awaya.
1. This is a global outage. Net Neutrality is down in US only.
2. The pattern of failure is not consistent with American ISPs past attempts. They've wanted to make Netflix less attractive compared to their services, not outright broken. Subtlety is better because outright breakage brings too much attention.
I've tried to get Netflix's OSS tools like Spinnaker running and it's a total nightmare with how many interdependent services need to run. It took me days to get running and was never reliable. I think they drank a little too much koolaid.
Microservices + async DB updates = hell. After working on such projects, I respect the wisdom of Google making all of their data stores immediately consistent.
A monolith isn't bad at all if it's organized and built to scale horizontally. Move your consistency concerns back to the database where they belong, not between services. It's a joke that monoliths are bad when you look at every popular operating system kernel. These things are multi million line binary blobs in languages that aren't friendly to mistakes and they run EVERYTHING
You may want to check out Armory Spinnaker. It's a commercial version of Spinnaker that takes care of all the hard bits of setting it up.
Disclaimer: I'm an investor in Armory
We'd love to hear how it goes for you!
Netflix has built an amazing system for messing with their servers to try find failure points. And they have a bunch of cleanup jobs that run to fix consistency errors. This is just what I picked up from a past obsession with their architecture.
Then I worked on a few projects built that way, and realized it was a horrible nightmare. By having a global immediately consistent datastore you can push all these concerns back to the database, and your codebase ends up far smaller.
The failure modes tend to explode in complexity when you're doing rpc across services with different datastoress because you have to deal with distributed transactions yourself. Every single call needs to be able to unwind itself across all services.
If you have a call that fans out to 10 other service calls, you need to be able to unwind any of them in any order. It quickly becomes untenable