This sure makes it easy to know who is hosted by google by going to downdetector.com.
whois $(dig +short spotify.com A)
EDIT: I was wrong, as pointed out by shizcakes.
spotify.com. 172800 IN NS dns1.p07.nsone.net.
spotify.com. 172800 IN NS dns2.p07.nsone.net.
spotify.com. 172800 IN NS dns3.p07.nsone.net.
spotify.com. 172800 IN NS dns4.p07.nsone.net.
spotify.com. 172800 IN NS ns-cloud-a1.googledomains.com.
spotify.com. 172800 IN NS ns-cloud-a2.googledomains.com.
spotify.com. 172800 IN NS ns-cloud-a3.googledomains.com.
spotify.com. 172800 IN NS ns-cloud-a4.googledomains.com.
Any non-offline playlists/songs just sit there not playing or telling me I'm offline
edit @ 37min: App also seems to be working again now
Seems to be a bigger issue.
edit: Nest is down too: http://nest.com
Fitbit.com is 404 too: https://www.fitbit.com
Big GCP issue?
edit2: Downdetector.com shows multiple website and services as down, including Pokemon GO or Rocket League.
GCP status page is still green all over the board: https://status.cloud.google.com
19:10 CET update: Some websites are coming back, including spotify.com, but their app still does not work for me.
information about outage just added to GCP status page, direct link: https://status.cloud.google.com/incidents/6PM5mNd43NbMqjCZ5R...
Description: We are experiencing an issue with Cloud Networking beginning at Tuesday, 2021-11-16 09:53 US/Pacific.
Our engineering team continues to investigate the issue.
We will provide an update by Tuesday, 2021-11-16 10:40 US/Pacific with current details.
We apologize to all who are affected by the disruption.
19:20 CET update:
Description: We believe the issue with Cloud Networking is partially resolved.
Customers will be unable to apply changes to their load balancers until the issue is fully resolved.
We do not have an ETA for full resolution at this point.
We will provide an update by Tuesday, 2021-11-16 11:28 US/Pacific with current details.
Spotify desktop app still not working for me.
19:45 CET: Spotify app is back online for me.
Incident began at 2021-11-16 10:10 (all times are US/Pacific).
At least make the screen not show all green or something automatically
If the automation is working the services will be up. When an incident is happening it's because something is significantly broken, and automation won't properly understand what is and is not working.
For instance, lots of follow-on alarms might be firing for what are not actually issues with the things being monitored: As an example, I would imagine that datacenter temperatures and fan speeds dropped due to the incident, which might cause automation to suspect a facilities issue, but announcing a facilities issue would be misleading.
Or metrics around instances live might be tanking as autoscaling groups start downsizing. This would not be an issue with the autoscaling service, and automatically announcing an autoscaling outage would again be misleading.
In an incident, taking the available data and reaching a conclusion about what is broken and what are effects is something which requires skilled manual effort and is error-prone.
The broken ones is how I usually do it.
The automation doesn't need to do that, it doesn't need to analyse the situation. It needs to communicate "Hey. Our systems have seen this and have pinged humans, bear with" rather than "nope even though half the internet is down rn, it's all good baby"
Make a green tick a blue questionmark or something. It doesn't even need to admit fault, it just needs to not be useless. My goal visiting the page is to get a link I can send clients "Updates will be posted here". Nothing more.
Also if you're hosting your monitoring system on the same system it's monitoring you've just completely missed the point. At least use a different region within your cloud provider, better would be completely different provider. I'd even go as far as using different domains/TLDs to host the page if I was Google sized
The status page will always lag the outage. It's not a conspiracy.
Now, I'm on Azure, but it seems like from the comments the situations are similar. So, instead of an automatically updated status page that would help engineers do their jobs, we get a status page that isn't accurate, and customers have pull teeth to get a service credit where/when one is due. And it seems like you can have the cake and eat it too here: while IANAL, a footnote in the SLA or the status page that "this is a machine estimate and not reflective of what goes into the SLA" should do it, no?
20 minutes seems pretty reasonable to me.
I might at least hold out some chance that Google Cloud will write an interesting PM, which is something Azure would never do IME.
That server continues to work just fine.
Edit: All of ours are back up. Some other services still seem down though.
There ought to be a law that essentially says if ads are “paying” for content, there must be a flawless link between ads and content such that the system can tell if the content is available (or detect after the fact that something was not delivered properly). And then, based on that, it either is required to ensure the ad never plays (since the content cannot be delivered), or that the user must be compensated in some way (e.g. we see you were forced to see an ad but got nothing so we are crediting $1 to your account).
Self hosted Spotify. Compatible with subsonic clients.
Some are 404ing at the moment and others work just fine. Feels like a GLB issue.
Nothing in my GCP dashboard seems to be aware of the issue however.
Only reason I found out is because I use an external service to ping me if a site is down.
Everything is fine.
Everything is under control.
Do you think the DevOps teams at these billion dollar streaming companies are so clueless that they don’t have monitoring in place?
Do you think that people who go to a site when it’s down don’t see the same thing?
So whose awareness does this serve?
Yeah, Vercel is running some GCP services it seems
Not super noteworthy
Flags being gigadownvotes on this site suuuuuucks now. How do people learn if nobody can correct them? Daft.
I used to have a vouch option but I guess I used it too much haha