Hacker News new | past | comments | ask | show | jobs | submit login
Error 404 (Not Found) (spotify.com)
400 points by fairytale 68 days ago | hide | past | favorite | 183 comments



back up as of now (10:10a PDT)

This sure makes it easy to know who is hosted by google by going to downdetector.com.


Not exactly hard to do that in the first place

  whois $(dig +short spotify.com A)
  
  NetName:        GOOGLE-CLOUD


That only says spotify.com is using GCP's DNS but not necessarily for hosting?

EDIT: I was wrong, as pointed out by shizcakes.


That's not what that says. It's doing a whois on the hosting IP address, not the DNS service.


Ah you're right! Thanks!


While we are at it, looks like spotify.com is using a mix of NS1 and GCP for DNS.

spotify.com. 172800 IN NS dns1.p07.nsone.net.

spotify.com. 172800 IN NS dns2.p07.nsone.net.

spotify.com. 172800 IN NS dns3.p07.nsone.net.

spotify.com. 172800 IN NS dns4.p07.nsone.net.

spotify.com. 172800 IN NS ns-cloud-a1.googledomains.com.

spotify.com. 172800 IN NS ns-cloud-a2.googledomains.com.

spotify.com. 172800 IN NS ns-cloud-a3.googledomains.com.

spotify.com. 172800 IN NS ns-cloud-a4.googledomains.com.


How can you name your service NS1 but then not use the ns1. subdomain for the first nameserver. SMH


Yeah, per the status page, there's a temporary mitigation rolled out while the team tries to figure out more... so no more 404s but load balancers will be locked down for a while while investigation continues.


Hmm it shows the same spike for instagram and aws as well. Would be funny to me if something on their end depends on GCP. https://downdetector.com/status/aws-amazon-web-services/ https://downdetector.com/status/instagram/


Google Analytics registering as client fails, maybe?


Can confirm. https://spotify.com is now working fine.


Can unconfirm. I can see their website but the app isn't playing nicely at all

Any non-offline playlists/songs just sit there not playing or telling me I'm offline

edit @ 37min: App also seems to be working again now


BigCommerce is likely one of those. Our BigCommerce store was down for several hours today.


Github graphql API seems to be throwing error responses for me still


Etsy is 404 too: https://www.etsy.com

Seems to be a bigger issue.

edit: Nest is down too: http://nest.com

Fitbit.com is 404 too: https://www.fitbit.com

Big GCP issue?

edit2: Downdetector.com shows multiple website and services as down, including Pokemon GO or Rocket League.

GCP status page is still green all over the board: https://status.cloud.google.com

19:10 CET update: Some websites are coming back, including spotify.com, but their app still does not work for me.

information about outage just added to GCP status page, direct link: https://status.cloud.google.com/incidents/6PM5mNd43NbMqjCZ5R...

Description: We are experiencing an issue with Cloud Networking beginning at Tuesday, 2021-11-16 09:53 US/Pacific.

Our engineering team continues to investigate the issue.

We will provide an update by Tuesday, 2021-11-16 10:40 US/Pacific with current details.

We apologize to all who are affected by the disruption.

19:20 CET update:

Description: We believe the issue with Cloud Networking is partially resolved.

Customers will be unable to apply changes to their load balancers until the issue is fully resolved.

We do not have an ETA for full resolution at this point.

We will provide an update by Tuesday, 2021-11-16 11:28 US/Pacific with current details.

Spotify desktop app still not working for me.

19:45 CET: Spotify app is back online for me.


Another cloud issue? Wow, nearly a week's uptime. https://news.ycombinator.com/item?id=29197832


When Spotify went down I switched to Youtube Music. Funny how that's still running but Spotify isn't, if it is in fact a Google issue.


Internal Google infra and GCP are different beasts. Sure some failure modes take both down but more likely than not they are independent.


That's why many choose AWS, they eat their own dog food.


Parts of google are also down it seems, if one clicks three lines on the top right on google.com in mobile browser, then the https://www.google.com/mobile/?newwindow=1 opens, which also appears 404 not found.


I’m assuming all these websites point to Google web frontend servers and for some reason it’s no longer able to map the Host header to the proper backend to proxy to.


This is a sort of repetitive outage for Google. They've wiped out the GSLB configs before. A year ago there was also that big outage where they blanked out the whole Gmail delivery configs and started rejecting all mails (even for gmail.com). Config safety is not their strong suit.


For what it's worth, not only HTTP(S) load balancers went down. We have a couple TCP proxy load balancers that went down too.


I was seeing regular levels of health checks throughout the entire outage, so they still had configs in place. That seems plausible.



https://deno.land down too. I guess a lot of CI actions will be failing just like ours...


deno.land is working for me, while lots of other sites are not. Did they failover to another provider?


Working now, yes. Though deno.land's current IP still nslookup's to something inside googleusercontent.com so maybe google has fixed their Side. But it was definitely offline earlier today (cf. our github CI failures and downforeveryoneorjustme)


It seems things are gradually coming back. All of our load balancers started working again.


discord seems to be down as well


homedepot.com is 404, various pages on lowes.com are 404


Egnyte had a pretty major outage today too.

https://status.egnyte.com


I have things running on GCP that are OK. I think it's related to nameservers.


I don't think so, https://dnsviz.net/d/spotify.com/YHUfXQ/dnssec/ shows that spotify.com resolved to the same 35.186.224.25 as it does now.


It is (was?) definitely a load balancer issue. We had multiple load balancers on completely different DNS providers fail.


All work for me in the EU apart from Etsy.


Homedepot.com appears down as well.


Gojek and Tokopedia too.


Evernote too.


Global: Experiencing Issue with Cloud networking

Incident began at 2021-11-16 10:10 (all times are US/Pacific).

https://status.cloud.google.com/incidents/6PM5mNd43NbMqjCZ5R...


Looks like perhaps an issue with Google Load Balancer. We have a load balancer in front of Google Storage Buckets, and can access resources directly from the buckets, but getting 404 when going through the load balancer.


Yep, it's the GLB. Went down for us at 5:46 GMT. Just responding 404 and logs reporting an internal error.


Non-engineer here - Is there an easy way to multi-provider redundancy around this? Can you have LBs on multiple clouds and use dns to move around or something? Or does your LB have to be at the provider the app is at? Sorry if this makes no sense. :o


yes it's 100% technically possible, the main issue is it would be significantly more expensive.


I can confirm this on my side too


Half the internet is down because of a Google Cloud global issue on their load balancers, including Spotify and Etsy and GCP status is all green: https://status.cloud.google.com If you ever wondered why GCP is a distant third runner in the enterprise cloud space, here is your answer.


Expecting an instant public post is a bit unrealistic. They had a post up just over 20 minutes after the incident start, which is not that crazy, given that they needed time to triage all of the alarms and understand which component was actually breaking and confirm some technically correct information around it, even if the actual internal incident response can run without the public post.


It's Google though.. they stop automating everything?

At least make the screen not show all green or something automatically


Which things should not be green?

If the automation is working the services will be up. When an incident is happening it's because something is significantly broken, and automation won't properly understand what is and is not working.

For instance, lots of follow-on alarms might be firing for what are not actually issues with the things being monitored: As an example, I would imagine that datacenter temperatures and fan speeds dropped due to the incident, which might cause automation to suspect a facilities issue, but announcing a facilities issue would be misleading.

Or metrics around instances live might be tanking as autoscaling groups start downsizing. This would not be an issue with the autoscaling service, and automatically announcing an autoscaling outage would again be misleading.

In an incident, taking the available data and reaching a conclusion about what is broken and what are effects is something which requires skilled manual effort and is error-prone.


> Which things should not be green?

The broken ones is how I usually do it.

The automation doesn't need to do that, it doesn't need to analyse the situation. It needs to communicate "Hey. Our systems have seen this and have pinged humans, bear with" rather than "nope even though half the internet is down rn, it's all good baby"

Make a green tick a blue questionmark or something. It doesn't even need to admit fault, it just needs to not be useless. My goal visiting the page is to get a link I can send clients "Updates will be posted here". Nothing more.

Also if you're hosting your monitoring system on the same system it's monitoring you've just completely missed the point. At least use a different region within your cloud provider, better would be completely different provider. I'd even go as far as using different domains/TLDs to host the page if I was Google sized


Monitoring should be on a different system, unaffected by an outage in the monitored system.


I think that's tangential to my point. The concerns in my post you replied to about system interdependence making it hard for a monitoring system to separate cause and effect, even if that monitoring system is itself working properly.


The big three cloud platforms all have this issue of delayed status updates. Why do you think it's just GCP?


...and there's a reason for that. Automations to update the status page are rarely acceptable, since the status page statuses have legal and financial implications. Therefore, the IM usually has to update it (or tell someone to update it). But, realistically, when you get paged, you first need to figure out what exactly is wrong and at least a vague idea of why. Then, you need to tell someone to update the page. Then, it gets updated.

The status page will always lag the outage. It's not a conspiracy.


Status pages should be driven that way, though. "legal and financial" implications and "It's not a conspiracy" is a poor excuse.

Now, I'm on Azure, but it seems like from the comments the situations are similar. So, instead of an automatically updated status page that would help engineers do their jobs, we get a status page that isn't accurate, and customers have pull teeth to get a service credit where/when one is due. And it seems like you can have the cake and eat it too here: while IANAL, a footnote in the SLA or the status page that "this is a machine estimate and not reflective of what goes into the SLA" should do it, no?


Not updating the status page, to avoid the legal and financial implications, is fraud -- taking money on false pretenses.


fraud? how? what guarantees do they make about timeliness of status updates on their services?


Also, in most teams, people who do external communication are different from those doing triage and troubleshooting.


Yeah, but they are still people who are responding to a page, working on wording and getting it approved, and then updating.

20 minutes seems pretty reasonable to me.


AWS typically has the same issue


Azure has the same issues with updating their status page. Sometimes it never happens.

I might at least hold out some chance that Google Cloud will write an interesting PM, which is something Azure would never do IME.


I'm old enough to remember the old claim that the internet was designed to cope with a nuclear attack.


Ironically it appears that IsItDownRightNow? is also down, although that could because they're experiencing what is basically going to be the equivalent of a DDOS.


I wonder if they have advertising methods in place to turn days like this into a windfall.


downdetector.com is the most resistant in my experience.


Time to switch to https://bandcamp.com.



Cool, how do I stream the newest Taylor Swift album on that?


You write the domain (ipfs.io) on a blackboard with a chalk pen then take your nails and scratch the board.

(jk.)


My colleague (who loves Taylor Swift) bought the mp3s from somewhere (amazon?) and uploaded them onto her plex server the hour it came out.

That server continues to work just fine.


As a heavy plex user, I can't imagine using it as my default music player. CX isn't great for music, IMO.


I wonder if a legal discovery will ever find internal status dashboards that reflect reality rather than fictitious SLA liability-aware status pages.


You don't need legal discovery for that. Every "X as a service" contract you sign will explicitly say that SLAs aren't dependent on dashboards/ping tests but rather a mostly subjective measure of "availability".


Your cynicism is justified and clearly based on the same experience I have :)


This is clearly the hottest thing on HN right now, and it was bumped from #1 to #6, anyone knows why? Is it some kind of bot protection?


User flags, because outages are a fact of everyday life.


Which is dumb, linking to status pages shouldn't be on HN. A blog that has analysis and explanations of outages or post mortems should.

knock-knock dang


Dang doesn't see messages like that unless you use the footer Contact link, but I remember a comment from him a while back that I would summarize as "Some site users think it's a good use of HN, and other site users disagree and flag it, and we downweight/dedupe them sometimes and/or if someone emails us with the Contact link". I just didn't want you to wait for a reply that'll never come unless you Contact them.


hehe I know, was just saying more for fun, but I appreciate the comment none the less.


https://linear.app/ is also down


We found extra rules in our GCLB routing config - removing them restored our service.


Netlify is also failing for us, and reporting bad TLS certs. Not sure if they use GCP https://www.netlifystatus.com/


Yes confirmed its because netlify's origin servers use Google Load Balancers. According to this blog they claim "We can easily move the entire brains of our service between Google, Amazon, and Rackspace in around 10 minutes with no service interruptions.". Lets see if thats true. https://www.netlify.com/blog/2018/05/14/how-netlify-migrated...


We bypassed our Google load balancer and pointed DNS directly at the IP addresses of our servers and that seems to have helped


The title of the 404 page on all the down sites has an extra "1" after the exclamation points: "Error 404 (Not Found)!!1"


That's just a cutesy Internet meme thing.


That has been the case for years!


Sites that are down according to https://downdetector.com include Spotify, Discord, Snapchat, Etsy, Pokemon Go, Epic Games, Target, Paramount+, Evernote


Global: Experiencing Issue with Cloud networking Incident began at 2021-11-16 10:10 (all times are US/Pacific). https://status.cloud.google.com/incidents/6PM5mNd43NbMqjCZ5R...


GCP outage? Status page shows green but a bunch of sites seem down (Rocket League most importantly).


Most importantly this game I play for leisure is down!


I mean, that’s a fair point… leisure activity, whatever it is, is required for rest and to avoid burndown


I think it was a joke.


I wondered why our alerts started going nuts. Seems like basically every global Google Cloud load balancer went down. Doesn't seem to affect single-region network load balancers.

Edit: All of ours are back up. Some other services still seem down though.


Funny thing is when you google Home Depot or Paramount Plus you get ads served by Google as the first result. When click on it Google then shows you a 404 page. I wonder if they'll get a refund on their Adwords campaign.


One of my pet peeves with so many services! Their obnoxious pre-ads can play flawlessly (stealing your time/eyeballs and giving them benefit), and they can still fail to give you the content you exchanged your time/eyeballs to see. Worse, they can repeatedly fail and repeatedly drill the same ads into your brain.

There ought to be a law that essentially says if ads are “paying” for content, there must be a flawless link between ads and content such that the system can tell if the content is available (or detect after the fact that something was not delivered properly). And then, based on that, it either is required to ensure the ad never plays (since the content cannot be delivered), or that the user must be compensated in some way (e.g. we see you were forced to see an ad but got nothing so we are crediting $1 to your account).


Why? That's not part of the ad contract. They will get refunded for GCP if it goes out of SLO.


I also wonder how many companies didn't want to admit they were using Google for their infrastructure. Downdetector shows AWS being affected, it'd be embarrassing if they were caught using Google Cloud Platform.


Seriously doubt that AWS, Facebook are using Google for infra. There's probably some other effect at play, like people using a Google service to connect to these things. Also don't see any effects on those services personally.


ok, so our API is down. We're on GCP...

https://api.newscatcherapi.com/v2/search



We were down. Just came back. Things seem to be resolving.


https://www.navidrome.org/

Self hosted Spotify. Compatible with subsonic clients.


On one hand, Spotify is much cheaper, on other, perhaps artistd gets paid more (assuming you acquire music legaly)


My company was seeing GCP Airflow environments not responding, but they seem to have recovered in the past few minutes.


https://overleaf.com/ is also 404 now


https://toggl.com/ is also affected.


I have a few services running from the same GKE cluster, same ingress controller, same nodes, same GLB, same everything.

Some are 404ing at the moment and others work just fine. Feels like a GLB issue.

Nothing in my GCP dashboard seems to be aware of the issue however.

Only reason I found out is because I use an external service to ping me if a site is down.


https://www.windy.com down, same issue.


The 404's changed into 502's.. I guess that's progress. Fingers crossed it's back up soon


Yes, but we're still on DEFCON 5.


Do you mean DEFCON 1 or 5? Trying to get a feel of the situation. https://en.wikipedia.org/wiki/DEFCON


Don't worry.

DEFCON 5.

Everything is fine. Everything is under control.

Or not...


Seeing the same - I have projects in us-east1 that went offline first, then us-west1 went offline a few minutes after. Everything green on their status page and nothing in the dashboard - everything returns a 404 so I'm assuming a really high level LB just took a dump.


Seems to have just resolved itself in us-east1 so I'm hoping us-west1 follows a few minutes after.


I don’t understand why people post website outages.

Do you think the DevOps teams at these billion dollar streaming companies are so clueless that they don’t have monitoring in place?

Do you think that people who go to a site when it’s down don’t see the same thing?

So whose awareness does this serve?


In general, people post things on HN to discuss them. This includes high profile web outages.


Looks like it's made it to the Google status page: https://status.cloud.google.com/incidents/6PM5mNd43NbMqjCZ5R...


Is it regional? Surprised Spotify is not active-active on other platforms like AWS and Azure.


Right now homedepot.com and the APIs that drive their mobile app are down too.


Not Google's best week.


Might be Google Cloud outage: https://news.ycombinator.com/item?id=29243753


Everything seems to be working on our end as of 1:08 PM EST.


Affecting us. Busiest time of the year and now down 20 min. It's the Global Load Balancer, so god knows what bit of the global edge has been taken out.


At least you don't run your own servers


One of the websites I've noticed this on is back up.


- discord doesn't allow me to connect either.


1:10PM - either it has resolved itself, or a regional issue, but I'm not seeing anything being down from the East coast USA.


As far as I can tell everything is up it's just that our load balancers aren't routing traffic and just returning 404s.


Funny enough, datadog, which I was using to investigate on of my vercel sites, is down too

Yeah, Vercel is running some GCP services it seems


Are you able to access your data in the AWS-hosted datadog instance?


Well, is not like I know how to switch to it, but Datadog came back for me, probably because of that


I knew Google would one day take control of the web, just thought they'd have a more clever way of doing it.


Oh okay, thought something was on my end.


gcp load balancer responding with 404


Rocket league perhaps epic games even


I was experienced this issue with my spotify app. Initially thought, it was my internet issue lol.


Seeing this across the board with providers on GCP. Firestore however does not appear to be down



That's why I like the clouds.


Discord down


My stuff is running OK on GCP, with GKE usage. Maybe it's related to nameservers?


if you are using regional load balancers or serving traffic directly from nodes then you would be fine :) "only" global LB failed


It looks like everything is back now. That was a short outage by recent standards...


Seems like another DNS issue - switching to Cloudflare 1.1.1.1 got me back online...


Seems to be an issue with GOOGLE cloud load balancer. our website is down too


Seems to be coming back up.


Also experience that with all my GCP related infrastructure (Europe)


Our infra in us-central1 behind gcp lb is impacted but not us-west1


My app was down too. Can confirm it is most likely the GCloud LB


Our instances started working again, so seems to be fixed


Pokémon GO also down


Same here, seems to be an issue with load balancers


GCP - my Cloud Run containers are giving 500's


overleaf.com also


My heart goes out to all the math students who were frantically texing their problem sets last minute


just wanted to start writing my thesis


(Overleaf co-founder here.) We're back up, so get writing :)


CBS.com is down, and some NYT pages as well.


Snapchat also having issues refreshing


Snapchat runs on the Google Cloud.


I noticed Discord went down for me.


veed.io is down too! Same problem


it was back in a few minutes. But that wasw pretty weird. App Engine was affected.


Things are back online again!


Lowes search was also down.


Discord seems to be down.


We're also affected.


and back up and running!


Seems to be back already


Discord is down too.


Seems to be fixed.


It's back up


It's back!


looks like it's back up?


[flagged]


That's just an inside joke and has been a default error page for Google/GCP for as long as I can remember.


Google’s 404 has always been like that


That's noteworthy. Do you have an archived url example? :O


Looks like it changed to that title in 2011. http://web.archive.org/web/20110408072631/http://www.google....


It’s been like that for all the time that I can remember (at least a decade)

Not super noteworthy


You're right. It's just bad.


For what it's worth I don't think you should have been flagged or even downvoted just for being wrong. Corrected yes, absolutely. Sinking to the bottom of low contrast lake was a bit much.

Flags being gigadownvotes on this site suuuuuucks now. How do people learn if nobody can correct them? Daft.

I used to have a vouch option but I guess I used it too much haha


nice




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: