Edit: Since it's topical, I enjoy listening to Chuck Rossi in interviews or presentations.
Releng 2014 - Keynote 1: Chuck Rossi, Release Engineering, Facebook Inc. | Talks at Google 
> they could cause a large enough DDoS to affect the
> massive juggernaut that is Facebook
There's (I assume) no chance that it's been done using Slow Loris, but that's a good example of doing something unusual to deplete a resource in an unusual way.
I would tend to agree that it's unlikely Facebook have simply been flooded off the internet, but there are many other ways to perform much more targeted DDoS attacks, and presumably Facebook haven't mitigated against _all_ of them.
However, if you could cripple DNS propagation/resolution from their systems to the next-hop upstreams, it would show up as services being very slow to respond. They do serve content from multiple domains.
I'd be interested to know how Google's 22.214.171.124 infrastructure defends them from DNS level manipulation...
I guess we'll just have to wait and see if Facebook makes any public statements as to what the cause of the outage was.
> seems like something that would bolster their rep
I'm guessing the multiple services going down at once has to do with Akamai. It seems like there's some speculation about the storm on the east coast and their Boston datacenter.
Though, as has been said before, anyone can claim responsibility. We'll soon see.
E: not that I support these guys in any way. Just curious what would happen -- and whether we might all learn something about DDoS mitigation from such an event.
nostrademons's explanation of SRE is the correct one, IMO. The architecture is key. Engineering has to be built to allow that. It has helped me in the past to say SREs are concerned more with the operation of a service than a group of machines offering a service; it's almost like a service operations developer. When a company thinks in terms of services and abstracts the machine away, i.e., containers, scheduling, Mesos, Omega/<unnamed>, intelligent CI/CD, service discovery, now you're getting into SRE territory instead of SA territory. The architecture involvement distinguishes SRE from devops for me. You should be able to trust SRE to build services, not just run engineering output.
Teams that congeal out of Xooglers tend to preach SRE well, and there is the occasional company (Twitter and Foursquare come to mind) that applies the title and interacts with the team as intended.
No hints there, guess we'll just have to wait for a full post-mortem to come out.
"Current State: Fix Pushed
Facebook and Instagram experienced a major outage tonight from 22:10 until 23:10 PST. Our engineers identified the cause of the outage and recovered the site quickly. You should now see decreasing error rates while our systems stabilize. We don't expect any other break in service. I'll post another update within 30 mins. Thank you for your patience."
Might not be a hack! Might just be weather in Boston:
"Akamai (provider for FB, Instagram and so on) claims to be down due to power outage. #LizardSquad claims to have hacked them. Get popcorn."
Edit: someone wrote this below in the comments, which came to $400 per second of downtime. Ouch.
(source for cost: Cormorant Alpha shutdown in 2013)
Edit: misplaced parenthesis => my first number was way off, $70 000 vs $100. Redid analysis, made mental note not to do math before coffee intake.
$70,000 = $2 trillion a year, assuming 24/365 usage.
Total value of all oil produced in the world (assuming $50/barrel) is roughly $1.4 trillion a year.
It's not just you! http://facebook.com looks down from here.
Was using both when they became unresponsive, resetting routers all over the shop.
I wonder what makes FB and Instagram different?
Honestly, I doubt it was an Akamai issue. If Akamai experienced network problems, dozens or hundreds of sites would be affected. If an Akamai config issue (ie human error) were to blame, then it would probably only affect one site, not several. Neither FB nor Akamai is dumb enough to push multiple site changes at once.
Edit: using more up to date data:
$3,300,000,000 in the 3rd quarter of 2014 / (365/4) / 24 / 60 / 60 = $418.58
But considering this is not the peak time in North America, it's should be little less.
When facebook comes up, people will catch up on "lost facebook time" most likely. Over the course of the week (or even day) it will most likely even out.
As others have remarked, one way one can interpret this is it is a testament to their ops' exceptional ability as engineers that such downtime is so noticeable.
Well thats what Zuckerberg answered a user to the question "Why Facebook wasn't cool anymore". He didn't want it to be cool , but more like a digital utility like water or electricity
whois facebook.com @126.96.36.199
Whois Server Version 2.0
Domain names in the .com and .net domains can now be registered
with many different competing registrars. Go to http://www.internic.net
for detailed information.
To single out one record, look it up with "xxx", where xxx is one of the
records displayed above. If the records are the same, look them up
with "=xxx" to receive a full display for each record.
>>> Last update of whois database: Tue, 27 Jan 2015 06:56:04 GMT <<<
As for reliability, I've been using Slack for months without a single outage. If it went down for a day, the impact would be minimal -- I'd just use email and communicate less for that one day. Or I'd switch to another provider. Big deal.
Anyway, if that is true, people have to take it really seriously this time.
Forget Sony. Nobody outside of hollywood bigwigs were impacted by that. But Facebook? The US government is going to go nuts. I hope lizardsquad are driving drone-proof cars.
Facebook is now back online for me on my home connection (BC Canada) and every Tor node I try. Either the attack is over or FB has fought back.
"@LizardMafia: More to come soon. Side note: We're still organizing the @MAS email dump, stay tuned for that."
It's okay not to like Facebook, and there are many good reasons to be sceptical about them as a company, but to call them pointless is so utterly myopic that it beggars belief.
Perhaps this is a sign that I should go back to blocking myself from Facebook with StayFocusd.
Kind of sad that I needed this kick in the backside, but I'll take this as an opportunity to reclaim empty calories and wasted time.
I feel like this is more the norm rather than an exception for them at this point, though.
It has never been down. Ever.