People would be reactivating their Facebook accounts and having to sift through conspiracy theory posts about Hillary Clinton still just to figure out what was going on.
Edit: The points on this post keep going up and down every time I check these comments. Yes, it was sarcasm, I was joking, but I was trying to point out that most people rely on a small set of services. "Cloud" has centralized things a lot.
Whenever we had any sort of issue we could generally get a good idea of what was happening by looking at changes in traffic in those two web tiers.
If people couldn't play for most reasons, game action traffic would drop to near zero, but the static asset tier traffic would usually at least triple.
So yeah, there are a lot of F5 buttons being hit out there when pages don't load.
It was not fully down, there were issues with their blobstore.
Google+ APIs were shutdown 9 March 2019
Customers in 1999 really couldn't believe no one had replied to their emails within a day or two.
Wait, people are doing this already:
Which is why I have to refrain from taking an "over the top much?" slant when people post the pro-Trump/victim of the left type memes as I don't see a ton of attacks on him directly, but then again, their feed could be totally different from mine so who knows?
I think the real bummer is when you present the actual video of what the person says, and the response is essentially “yeah? Well, they still suck” or something in that ballpark. I have zero issue with someone not liking another person’s views, but a lot of it is just outright libel.
Google sure, but what people in the real world cares about twitter?
Twitter could be down for days and only the technocracy would notice.
It also seems popular with journalists and media companies (e.g. TV shows asking viewers to "tweet us your questions")
I was out shoveling, and came back in to my phone blowing up. Our systems at IronMountain (formerly Fortrust) in Denver all rebooted at once. These are all on redundant power, each systems redundant power supplies connecting to different circuits entering the cabinet, and those two circuits fed from 3 PDUs (two separate, one share). Each of those is supposed to be fed by a separate UPS and generator. Last status update I had says that they are running off generators, but they've been shockingly tight-lipped about it.
Don't get me wrong, it was hi-LAR-ious to call into their NOC and have them pretend that I was the only one having problems. "Can you tell me if there is a major data center outage going on?" "We are trying to gather information, we are making a bunch of client phone calls, we will know after we make those calls." "... Why are you making a bunch of client calls if you aren't having an outage?"
They do run quarterly 'storms' where a datacenter is shut down to test failover and resiliency. I have no idea if today is one of those days, since I left last year.
For instance GitHub's relatively recent shutdown was due to a fail-over heartbeat not going as expected.
I think that is a yes, and he getting ahead by saying "Yes and we have no idea why or ETA so let us do our job".
Granted, they should have a status page.
On the other hand, I need information to be able to do my job: Is this only our cabinet having problems and I need to start rolling to the datacenter (in the middle of a giant blizzard)? Is this possibly some sort of problem with our own power infrastructure? Is something on fire (an EPO triggered by fire could cause this)? Did the roof cave in under the weight of the snow we are getting? Is the power stabilized or is there some indication that power might be up and down?
In short, I need answers to: Do I need to gracefully take down my site to prevent lost transactions and database corruption? Do I need to switch to our backup site?
For context: All of our servers powering off at once and then back on shouldn't be possible. It should require the failure of at least 3 independent pieces of equipment (except at the breaker panel or in our cabinet where it could be only two failures). It is extremely unusual for this to happen, first time it's happened for me and I've been in that facility since 2004.
So, yes, I respect that you need to do your job. But I also need to do my job.
Plus, I'm pretty sure the guy answering the trouble line, his job WAS talking with the customers. The people working the problem likely didn't include him. This is a huge data center run by a ginormous company. I don't think I was taking him away from twisting a wrench. :-)
But that's my presumption, I don't actually know anything and don't want to imply I do.
It might be resolved, it has to get worse before you escalate it further. They might not know the full facts. Might be worse than it really is. How do you know? You can't judge that because your personal rendering of Facebook failed. You have load balancers and CDNs and A/B testers all getting in the way of delivering data to your machine.
It's too easy to draw a conclusion from the client-side armchair and the provider is absolutely not going to make false promises, for the worse or for the better.
You want to hope that Facebook, in this case, acts on more complete information.
Paperweight contracts are irrelevant in a world of data
* A Smart Contract is cheaper to publish that the stack of paper handled by lawyers.
* Code is cheap to iterate from whereas traditional SLA are expensive/slow to renegociate.
Over time, SLAs drive behaviors that are focused on delivering a minimum level of service at minimum cost to the provider.
* A Smart Contract is a code you can trust, understand and expect to behave instantly compared to the traditional SLM.
* A Smart Contract is immutable.
More about the benefit of Smart Contract over paper/digital agreements:
Why not a python script?
I don't trust the guy that handle that script. Blockchain is very good at bringing trust.
I hope I answered your question
Feel free to give that whitepaper a look
The kind that will make a lot of noise as publicly as possible and create ample work for your support/admin people if you don't keep them happy...
> doesn’t sound like a customer you would want to be in business with
I could say that about most of the companies I have had the dubious pleasure of doing business with! Very few are pleasant when something goes awry even for a moment.
Deny, deny, deny, obfuscate, deny, then blame someone else (usually, YOU).
I'm already curious and have been pondering about this for quite a while to understand what/how/when we will see a disruption of this.
Given the proliferation of minute/hourly billing among service providers, it looks like the Multics folks guessed right. It just happened on top of Unix(-like systems) instead of Multics.
I wonder how long it'll be before we start seeing municipal datacenters?
EDIT: apparently, though, HHVM stopped supporting PHP itself last month; now it only supports Hack. I'm not familiar enough with Hack to know how much it actually deviates from / improves upon PHP.
This leaves me wondering what software all these places have in common. The application layers are all different, the databases are all different, the containerization and provisioning systems are different, but I imagine that all these systems rely on two things: the global Internet backbone, and maybe the Linux kernel.
Have there been major security vulnerabilities patched lately in the Linux kernel that could have had unintended consequences?
Its telling that one of the hottest areas of distributed systems research these days is the boring topic of configuration management. Google, Microsoft, etc are paying researchers top dollar to figure out how to prevent massive outages through novel techniques. It is one of the harder problems to solve and requires massive investment in tooling, refactoring, etc.
Curious what makes you think this. Are there specific job postings in either company that are focused on this?
 - https://www.microsoft.com/en-us/research/blog/eliminating-ne...
Sometimes you just get unlucky!
-- Ian Fleming (in Goldfinger)
>This leaves me wondering what software all these places have in common.
dunno what systems you're talking about, but seems likely they are mostly x86 systems and maybe even mostly using Intel hardware and microcode
those systems can-be/are rooted and more, to my knowledge
Cisco or Arista
I run a messenger bot platform - the webhooks stopped being delivered _hours_ ago... nothing on their status page until it had been down for hours.
Their current issue...
"We are currently experiencing issues that may cause some API requests to take longer or fail unexpectedly. We are investigating the issue and working on a resolution."
as for the advertisers, i doubt they 'll be charged for impressions that didn't happen
I strongly suspect users are reporting "my Internet is having troubles" because their FB, Messenger, etc. isn't working right.
For example, in the comments of the T-Mobile outage page, there's stuff like "Haven't been able to upload anything to social media all day" and "Cannot send pictures through whatsapp and fb messenger".
Also, check out the "Attacks" tab. That one really lights up. Like seriously lights up. Something is going on... all over. US, China, Russia, EU...
It "lights up" because there are always attacks happening.
The red are the areas with the most attacks, and as you'd expect, they correspond to large population centers. (It's also not very granular, and appears to largely correspond to "where does Akamai have a datacenter".)
I thought maybe my ISP blocked a port which these services maybe transferring their multimedia on.
Now, another interpretation is that the reports are simply false...
Edit: it's back now (8:37 PM UTC)
I can't wait to see the RCA for both of these and if they're related.
Public post Mortem:
Entirely believable technical cause.
(Ignore Stuxnet, Ignore DUQU)
It's the rice grain implants, man
The beam splitter stuff (e.g. Room 641A) went by different codenames, TRAFFICTHIEF and TURMOIL iirc. That's the back door.
IIRC, the NSA is organizationally part of the military, and it's currently headed by a military officer who gives congressional testimony in his uniform (https://www.youtube.com/watch?v=nMi241XLeQ8). It makes sense they'd build on military bases, it'd be kinda weird if they didn't.
Also, the majority of tech and data companies have closed this loophole by encrypting traffic between data centers. Nobody thought it was necessary to do it before over dark fiber before because, hey, who was listening? (answer: the NSA was)
>>>The New York Times reported in 2005 that the USS Jimmy Carter, a highly advanced submarine that was the only one of its class built, had a capability to tap undersea cables. An Institute of Electrical and Electronics Engineers report speculated that a 45-foot extension added to the Jimmy Carter provided this capability by allowing engineers to bring the cable up into a floodable chamber to install a tap. But it is unlikely that the USS Jimmy Carter routinely taps cables since U.S. intelligence agencies can much more easily (and lawfully) obtain cable data through taps at above-ground cable landing stations.
I don't think he made any pre-Snowden comments on mass surveillance?
Yet another alternative: Third World War has just started, and this was the first battle.
Dont conflate that with fb/insta problems.
(Oh, turns out the Great Blackout Baby Boom was a myth:
I guess that this is all that I will get. Facebook is never down, it is just making improvements (like restarting the services to make them work again).
Source: Me. My career has been spent managing db's for internet scale sites.
Screw ups related to data loss are rare (I've been here years and haven't seen one with the DBs that the stuff I work with uses) but failures at this scale tend to cascade a little ways and it takes time to dig out of the hole. They probably have the problem solved but they have to spend a bunch of time synchronizing things and verifying the fix before they press the big red "go live" button.
Amazon once pushed a seemingly-innocuous change to their internal DNS that caused all the routers between and within datacenters to drop their IP tables on the floor. They had to re-establish the entire network by hand---datacenter heads calling each other up and reading IP address ranges over the phone to be hand-entered into lookup tables. Cost a fortune in lost sales for the time the whole site was inaccessible.
Network failures are usually really bad when your system is globally deployed and distributed -- often times you can't even communicate with your machines to deliver fixes :p
Current State: Investigating
Description: We are currently experiencing issues that may cause some API requests to take longer or fail unexpectedly. We are investigating the issue and working on a resolution.
2 hours ago
about an hour ago
There are currently no updates for this issue.
Well someone tell him to stop, for pete's sake!
Edit: Or have other methods than just relying on Facebook authentication
Facebook obviously loses some ad revenue and Facebook customers may lose sales. But do Facebook/Instagram users suffer? But how does losing social media for several hours affect the quality of life of users?
Actually the government did block all social media for over a month but that was fixable with vpn. (Follow hashtag #SudanUprising on twitter to learn more)
What I asked was what is the effect of sporadic interruptions of few hours. I mean, if Facebook had 30% availability, would I lose anything valuable from the experience? Is it that we are just used to it and and want it to be there always?
The value of 99.5 availability fore __users__ is not clear to me. Instant messaging is exception for this.
I hate Facebook, but to deny its value is pretty naive.
On the other hand, if someone were to sabotage the platform and prove/convincingly argue that they induced the failure, at minimum it would do significant damage to the tech sector and at maximum cause public panic.
This is a hypothetical, not speculation on the cause of this outage.
Would be interesting to read the post mortem if there is any regardless
Edit: Has anyone seen anything of this sort in any of the projects they follow?
I believe this is why Github's status page is now on its own domain; so a github.com DNS outage won't take it down.