Websites and APIs on Render are unavailable due to Cloudflare network errors

anurag · 2023-08-11T04:39:44

--- edit ---

Everything is back up. We're waiting for Cloudflare's RCA and will follow up with additional Render context right after.

------------

(Render CEO) While Cloudflare investigates the issue on their end, we're also working on ways to bypass Cloudflare.

Really sorry about this, folks. We'll keep https://status.render.com updated and will post an RCA once things calm down.

Cloudflare have declared an incident at https://www.cloudflarestatus.com/incidents/2xffnv666yd7.

In case you're wondering, we use Cloudflare to keep Render's network up during DDoS attacks. Both Render and our customers are often targeted. We've already started building a product that lets customers bypass Cloudflare altogether, and I expect we'll see more demand for it after today's incident.

jumploops · 2023-08-11T05:45:23

Thanks for the response anurag.

We really like Render but are running into issues with Cloudflare blocking requests that are incorrectly flagged as malicious (our service passes code blocks over HTTP, similar to Replit).

Not to mention our site was down for way too long this evening…

We’d consider staying if we can bypass Cloudflare altogether. Render has been stable otherwise.

supriyo-biswas · 2023-08-11T06:03:37

As a quick workaround, try base64 encoding your payloads.

jumploops · 2023-08-11T06:39:43

Great idea for a simple fix. We’ve worked around the issue already, but it’d be nice if we could just trust our own authenticated content without the extra hoops.

anurag · 2023-08-11T14:47:13

Send me an email (address in profile) if you'd like alpha access to this feature.

jumploops · 2023-08-11T22:12:16

Sent.

reustle · 2023-08-11T05:29:52

> We've already started building a product that lets customers bypass Cloudflare altogether

Really happy to hear this. Thank you.

EspressoGPT · 2023-08-11T08:00:52

Thanks for the transparency here!

TekMol · 2023-08-11T05:30:45

Why are DDoS attacks a thing?

Who spends resources (money?) on running those? What is the incentive?

belter · 2023-08-11T10:47:50

They don't spend that many resources. Most participants in DDoS attacks are sometimes innocently recruited victims. Either victim of their own ignorance or victim of developers lack of care for secure defaults. In other words, some software product is deployed where it should not be...Then....The people and/or AI's who want to run these attacks, explore standard protocol behavior.

"Memcrashed - Major amplification attacks from UDP port 11211" - https://blog.cloudflare.com/memcrashed-major-amplification-a...

bombcar · 2023-08-11T06:28:45

DDoS attacks basically are still a thing because there's nobody really incentivized to solve it.

The people harmed by them are too small to fix it, and the people big enough make more money selling DDoS mitigation.

From what I understand you can avoid many DDoS just by going IPv6 only, because DDoS mainly depends on unpatched shitmachines from the old days.

vasachi · 2023-08-11T08:34:49

IPv6 can help with a small scale attack, but not a large one. Your ISP can still be DDoSed, although it is a bit more difficult to do that.

maccard · 2023-08-11T12:45:24

> Who spends resources (money?) on running those?

A raspberry pi can generate enough traffic to overload an otherwise unprotected service. It doesn't cost much, if anything to launch a brute force attack.

There's been posts on here about malicious browser extensions, infected IOT devices, malware in mobile apps that give someone the means to launch an utterly brutal attack. Imagine if I had a service that could handle 10k rps. Now imagine 600k android devices from all across the world send one request per second each [0].

[0] https://www.trendmicro.com/vinfo/pl/security/news/mobile-saf...

senectus1 · 2023-08-11T05:44:50

typically done via hacked bot farms that cost the attacker nothing other than the fun of rolling out standardized scripted attacks on poorly configured servers.

Why they do it... well:

Competition suppression

Vindictive nastiness

Fun

Just because you can (the world is your sandbox)

Other reasons that might not occur to you but are very real for the attacker...

mousetree · 2023-08-11T10:25:01

We (fintech bank) were DDoS a few times and sent ransom emails

seanthemon · 2023-08-11T12:47:51

"we'll stop these DDoS attacks if you pay us!" - chris@notcloudflare.com

The ol' window manufacturer trick /s

xwdv · 2023-08-11T14:54:01

HN regular conducts DDoS attacks on small weak websites.

cancan · 2023-08-11T05:44:22

(Happy Render customer) — Looking forward to the updates on this feature.

015a · 2023-08-11T16:26:03

I'm going to share a, probably, controversial opinion. That opinion is: I can't stand an outage title like "Websites and APIs on Render are unavailable due to Cloudflare network errors". Its passing blame. I run an app or two on Render. I don't pay Cloudflare; I pay Render. Take responsibility for the infrastructure decisions that you make, for your customers; don't pass blame to your infrastructure providers.

anurag · 2023-08-11T17:21:43

We take full responsibility for the infrastructure choices that led to this outage. As the peer comment said, it's helpful to overshare in these situations.

We know developers don't actually care who's at fault and will move off of Render if we're down, period. Even before the incident, we'd started working on a project to eliminate the SPOF with Cloudflare, and now it's only a matter of time before we ship it.

015a · 2023-08-13T03:45:32

I get that, and the update is much appreciated. I don't mean to insinuate that this was the intention behind why that language was chosen; its just the sentiment that the language conveys, and that's why I'm not a fan of it.

The stance that I take is; its a fine line between Oversharing and Passing Blame in outages like this, and while I'm happy that a line like that when shared by Render means it was just oversharing (I love your product!), its easy to see how a line like that when shared by a less admirable company could be seen as "Nah man, its not on us, we didn't do anything wrong." A critical difference being; if Cloudflare was the cause, how are we working toward avoiding this cause in the future; which leads nicely to where pointing at Cloudflare (or any upstream provider) generally feels more agreeable; the retro.

To be clear; I have no intention of leaving Render, even if y'all weren't planning to alleviate this SPOF. I fully grok the difficult engineering required to nuke SPOFs like Cloudflare or AWS; and a bit of downtime here and there is a price I'm fine with paying.

anurag · 2023-08-13T18:29:02

Heard, loud and clear. Thanks for the support.

true_religion · 2023-08-11T16:45:35

If Renders data center was down due to a city wide power outage I would still like to know because it’s the root cause.

I would still blame them for not having back up generators though.

However a failure to plan for emergencies is different from other kinds of failure.

015a · 2023-08-13T03:48:53

Sure, but there's a time and a place. Outages involve high tensions and fog of war; and you said it yourself, you're already ready to blame them for not having backup generators in this hypothetical example. The midst of an outage is not the time to start casting blame, on people, organizations, processes, providers, whatever. Outages are the time to fix; retros are the time to blame (within productive reason, of course).

pushdownandturn · 2023-08-11T18:19:48

If you ran it outside of Render, would you be using a CDN service or building your own?

The bigger issue you're alluding to is that of supply-chain reliability in SAAS products: when AWS goes down, multiple other (seemingly unrelated) services go down. But saying its the downstream service's fault is pointless, because if you were to do it yourself you'd be using the same upstream provider, and be dealing with their outage yourself.

In that example, Slack as a bigger of AWS would have a much bigger say, and a more direct line to AWS engineers, than you would.

015a · 2023-08-13T03:58:34

Right, and I think there's an interesting transitive correlation here: As a customer of Render, while Render was down because of Cloudflare; is it appropriate for me to post on our outage page: "Service interruption due to issues at Render"? "Service interruption due to issues at Cloudflare"? What does Cloudflare post on their page? (Well, they may actually post "due to a busted AC unit in our Seattle data center" which, you know, at that point we've hit bedrock so maybe that's valuable, but)

Its turtles all the way down, and in the midst of an outage I totally empathize with the off-the-cuff thinking that oversharing is better than undersharing, but after the fog of war clears you can even retro language like that and come to a different conclusion. What value do my customers, even if they're highly technical, gain by knowing its Render's fault that MyCoolService was down? Are they going to go open support tickets with Render? I'd bet Render very reasonably wouldn't appreciate that, and they're not going to have a better trunk to their support than I do.

Sytten · 2023-08-11T12:14:53

Resolved now, but an hour of downtime really shows you why you are paying for bigger cloud providers with an SLA and customer support. Honestly I wish we could have turned cloudflare off for the time of the issue vs having our api being down...

Maybe time to consider multiple CDN providers as an abstraction like you consider AWS/GCP as an abstraction.

anurag · 2023-08-11T14:54:32

> maybe time to consider multiple CDN providers

Yes. While Cloudflare is generally rock solid, we can't let this happen again.

boesboes · 2023-08-11T15:35:39

That is actually a selling point too. I've looked at both Fastly and Cloudflare a while back to replace the budget CDN at my previous job after an outage, but found both had more and more serious downtime in the 24m before that. So I just made a script to quickly switch between providers, but I'd rather not have to deal with it at all :-)

jamil7 · 2023-08-11T05:58:27

Kind of worrying that something like Cloudflare is so deeply baked into Render and customers don’t have a choice on whether or not they’re using it.

dbbk · 2023-08-11T12:57:34

Not really, Render is a managed stack, just like Heroku. If you want full control over the stack you can obviously do that. You pick Render so these choices are managed for you.

NicoJuicy · 2023-08-11T09:08:19

Picking technology to pick your products upon is not the customer's decision. It's the one who creates the product that the customers of Render wants.

It would be the same like requiring them to use Postgress instead of Ms Sql as a backend.

cpursley · 2023-08-11T13:33:45

I still find the trade off worth trying to roll out our own infrastructure (no, k8s isn’t “easy”). And it looks like they are already working on a more robust solution around this particular issue.

shash7 · 2023-08-11T04:42:41

It seems to be an error on Cloudflare's site. Render seems to be using Cloudflare in some integral capacity which has turned it toast.

As a render customer, its affecting us too. Hope Cloudflare fixes this asap.

reustle · 2023-08-11T05:19:52

As a Render customer, I wish I had the option to not use Cloudflare in any way.

eyeownyde · 2023-08-11T08:10:10

Why is that? Are these issues common?

wongarsu · 2023-08-11T11:45:44

Some don't like them on ideological grounds (centralization of a sizable portion of the internet), some don't like how Cloudflare can make your browsing experience miserable if you use TOR or turn on Firefox's anti-fingerprinting features (and Cloudflare is s major part of the reason these features are off by default)

breakingcups · 2023-08-11T17:43:49

I don't like Cloudflare skirting the responsibility of a hosting provider by claiming to be a neutral third party similar to an ISP instead of a company paid by their customers to distribute their content on the web.

Also them not taking a stance on housing despicable stuff like KF that literally bully people to their deaths.

psnehanshu · 2023-08-11T05:51:12

Cloudflare just published a resolution 7 minutes ago.

anaganisk · 2023-08-11T07:12:04

Unrelated but the choice of background(white) and choice of color(white) of the title in the "hero" section of the website is poor on mobile. I assume the site UI wasn't tested on mobile? https://i.imgur.com/M1n6SYv.jpg

EspressoGPT · 2023-08-11T07:59:53

What – this text is and always has been black for me.

wongarsu · 2023-08-11T11:47:20

If the website doesn't set the font color, it might be black in light mode but white when the device/os/browser is in dark mode

gregsadetsky · 2023-08-11T05:13:57

render.com down, our sites hosted there down as well... it sucks but it happens.

best of luck render & cloudflare teams!

EDIT: it's back! yay

tebbers · 2023-08-11T05:27:26

Hmmm, render.com is loading fine from London, UK for me.

gregsadetsky · 2023-08-11T05:28:27

it just came back online now - they were down when I checked

nik736 · 2023-08-11T13:22:49

So Render is single homed to Cloudflare?