Edit: Since it's topical, I enjoy listening to Chuck Rossi in interviews or presentations.
Releng 2014 - Keynote 1: Chuck Rossi, Release Engineering, Facebook Inc. | Talks at Google 
> they could cause a large enough DDoS to affect the
> massive juggernaut that is Facebook
There's (I assume) no chance that it's been done using Slow Loris, but that's a good example of doing something unusual to deplete a resource in an unusual way.
I would tend to agree that it's unlikely Facebook have simply been flooded off the internet, but there are many other ways to perform much more targeted DDoS attacks, and presumably Facebook haven't mitigated against _all_ of them.
However, if you could cripple DNS propagation/resolution from their systems to the next-hop upstreams, it would show up as services being very slow to respond. They do serve content from multiple domains.
I'd be interested to know how Google's 220.127.116.11 infrastructure defends them from DNS level manipulation...
I guess we'll just have to wait and see if Facebook makes any public statements as to what the cause of the outage was.
> seems like something that would bolster their rep
I'm guessing the multiple services going down at once has to do with Akamai. It seems like there's some speculation about the storm on the east coast and their Boston datacenter.
Though, as has been said before, anyone can claim responsibility. We'll soon see.
E: not that I support these guys in any way. Just curious what would happen -- and whether we might all learn something about DDoS mitigation from such an event.
nostrademons's explanation of SRE is the correct one, IMO. The architecture is key. Engineering has to be built to allow that. It has helped me in the past to say SREs are concerned more with the operation of a service than a group of machines offering a service; it's almost like a service operations developer. When a company thinks in terms of services and abstracts the machine away, i.e., containers, scheduling, Mesos, Omega/<unnamed>, intelligent CI/CD, service discovery, now you're getting into SRE territory instead of SA territory. The architecture involvement distinguishes SRE from devops for me. You should be able to trust SRE to build services, not just run engineering output.
Teams that congeal out of Xooglers tend to preach SRE well, and there is the occasional company (Twitter and Foursquare come to mind) that applies the title and interacts with the team as intended.
No hints there, guess we'll just have to wait for a full post-mortem to come out.
"Current State: Fix Pushed
Facebook and Instagram experienced a major outage tonight from 22:10 until 23:10 PST. Our engineers identified the cause of the outage and recovered the site quickly. You should now see decreasing error rates while our systems stabilize. We don't expect any other break in service. I'll post another update within 30 mins. Thank you for your patience."
Might not be a hack! Might just be weather in Boston:
"Akamai (provider for FB, Instagram and so on) claims to be down due to power outage. #LizardSquad claims to have hacked them. Get popcorn."
Edit: someone wrote this below in the comments, which came to $400 per second of downtime. Ouch.
(source for cost: Cormorant Alpha shutdown in 2013)
Edit: misplaced parenthesis => my first number was way off, $70 000 vs $100. Redid analysis, made mental note not to do math before coffee intake.
$70,000 = $2 trillion a year, assuming 24/365 usage.
Total value of all oil produced in the world (assuming $50/barrel) is roughly $1.4 trillion a year.
It's not just you! http://facebook.com looks down from here.
Was using both when they became unresponsive, resetting routers all over the shop.
I wonder what makes FB and Instagram different?
Honestly, I doubt it was an Akamai issue. If Akamai experienced network problems, dozens or hundreds of sites would be affected. If an Akamai config issue (ie human error) were to blame, then it would probably only affect one site, not several. Neither FB nor Akamai is dumb enough to push multiple site changes at once.
Edit: using more up to date data:
$3,300,000,000 in the 3rd quarter of 2014 / (365/4) / 24 / 60 / 60 = $418.58
But considering this is not the peak time in North America, it's should be little less.
When facebook comes up, people will catch up on "lost facebook time" most likely. Over the course of the week (or even day) it will most likely even out.
As others have remarked, one way one can interpret this is it is a testament to their ops' exceptional ability as engineers that such downtime is so noticeable.
Well thats what Zuckerberg answered a user to the question "Why Facebook wasn't cool anymore". He didn't want it to be cool , but more like a digital utility like water or electricity
whois facebook.com @18.104.22.168
Whois Server Version 2.0
Domain names in the .com and .net domains can now be registered
with many different competing registrars. Go to http://www.internic.net
for detailed information.
To single out one record, look it up with "xxx", where xxx is one of the
records displayed above. If the records are the same, look them up
with "=xxx" to receive a full display for each record.
>>> Last update of whois database: Tue, 27 Jan 2015 06:56:04 GMT <<<
As for reliability, I've been using Slack for months without a single outage. If it went down for a day, the impact would be minimal -- I'd just use email and communicate less for that one day. Or I'd switch to another provider. Big deal.
Anyway, if that is true, people have to take it really seriously this time.
Forget Sony. Nobody outside of hollywood bigwigs were impacted by that. But Facebook? The US government is going to go nuts. I hope lizardsquad are driving drone-proof cars.
Facebook is now back online for me on my home connection (BC Canada) and every Tor node I try. Either the attack is over or FB has fought back.
"@LizardMafia: More to come soon. Side note: We're still organizing the @MAS email dump, stay tuned for that."
It's okay not to like Facebook, and there are many good reasons to be sceptical about them as a company, but to call them pointless is so utterly myopic that it beggars belief.
Perhaps this is a sign that I should go back to blocking myself from Facebook with StayFocusd.
Kind of sad that I needed this kick in the backside, but I'll take this as an opportunity to reclaim empty calories and wasted time.
I feel like this is more the norm rather than an exception for them at this point, though.
It has never been down. Ever.
But sure, couple of script kiddies might be responsible as well. It's usually one or the other.
"Host unreachable", maybe they took out the load-balancer?
- traceroute to star.c10r.facebook.com (2a03:2880:f00c:6:face:b00c:0:2)
2 ge-1-0-0.bb1.a.syd.aarnet.net.au (2001:388:1:5001::1) 107.649 ms 91.569 ms 91.814 ms
3 ae9.pe2.brwy.nsw.aarnet.net.au (2001:388:1:88::1) 91.981 ms 92.822 ms 103.957 ms
4 ae5.pe1.brwy.nsw.aarnet.net.au (2001:388:1:87::1) 92.447 ms 93.29 ms 101.182 ms
5 et-1-1-0.pe1.rsby.nsw.aarnet.net.au (2001:388:1:66::1) 94.606 ms 109.654 ms 91.987 ms
6 et-0-3-0.nsw-msct-bdr1.aarnet.net.au (2001:388:1:a3::2) 96.119 ms 92.032 ms 92.57 ms
7 6453.syd.equinix.com (2001:de8:6::6453:1) 99.204 ms 111.52 ms 190.499 ms
8 if-xe-0-3-1.3.thar1.1MH-Sydney.ipv6.as6453.net (2405:2000:ffd0::a) 90.392 ms 90.862 ms 92.174 ms
9 if-3-0-0.2.core1.PV4-Piti.ipv6.as6453.net (2405:2000:ffd0::1a) 232.07 ms 210.376 ms 163.293 ms
10 if-xe-3-1-1.10.tcore1.TV2-Tokyo.ipv6.as6453.net (2405:2000:ffb::22) 191.668 ms 200.129 ms 226.642 ms
11 if-ae2.2.tcore2.TV2-Tokyo.ipv6.as6453.net (2001:5a0:2200:300::2) 192.905 ms 191.953 ms 201.681 ms
12 if-ae6.2.tcore1.SVW-Singapore.ipv6.as6453.net (2405:2000:ffa0:100::49) 288.696 ms 279.671 ms 269.912 ms
13 if-ae11.2.thar1.SVQ-Singapore.ipv6.as6453.net (2405:2000:300:100::d) 264.938 ms 264.522 ms 266.912 ms
14 2405:2000:300:100::16 (2405:2000:300:100::16) 354.279 ms 349.7 ms 389.75 ms
15 ae2.bb02.sin1.tfbnw.net (2620:0:1cff:dead:beef::84c) 349.649 ms 348.205 ms 347.904 ms
16 ae0.bb01.hkg1.tfbnw.net (2620:0:1cff:dead:beef::1bdc) 349.552 ms 356.726 ms 348.74 ms
17 be6.bb01.pdx1.tfbnw.net (2620:0:1cff:dead:beef::5dd) 366.212 ms 368.99 ms 368.89 ms
18 be9.bb01.prn2.tfbnw.net (2620:0:1cff:dead:beef::f5) 365.81 ms 366.327 ms 366.598 ms
19 ae10.dr02.prn1.tfbnw.net (2620:0:1cff:dead:beef::1c27) 377.387 ms 365.613 ms 365.392 ms
23 * * *
24 ae10.dr02.prn1.tfbnw.net (2620:0:1cff:dead:beef::1c27) 387.156 ms !H 383.877 ms !H 374.672 ms !H
In that sense, it's very possible that he has access to some major database dumps that the public is unaware of, and given that he is also claiming the glory behind the Malaysia Airlines hack; it's clear that he does more than DDoS, no matter what the more educated hackers are saying.
A long way to go for a secure internet, that's for sure.