
Cloudflare and RSS - vog
http://www.tedunangst.com/flak/post/cloudflare-and-rss
======
deno
Cloudflare’s misguided reliance on Javascript Paywalls[1] is fundamentally
hostile to open web. It’s essentially a form of DRM.

And they don’t even bother to implement it properly—for example if your site
tries to follow best practices and uses a separate domain for your static
assets, you will just get errors on your static assets, resulting in a page
with broken styling and no images. That despite pissing off your users by
having them go through the Google hosted captcha (which also breaks all the
time btw[2][3]).

One of the websites that was horribly broken by this was Stack Overflow. As
anyone trying to stay safe on public WiFi by using VPN can attest.

Coincidentally, Cloudflare has lost Stack Overflow as their customer recently:
[https://meta.stackoverflow.com/questions/323537/cloudflare-i...](https://meta.stackoverflow.com/questions/323537/cloudflare-
is-ruining-stack-overflow-for-me-with-its-recaptcha)

They’re now behind fastly.

_

[1]
[https://ipfs.pics/QmTZo6oPKHwUgWB7p7LfZwZsVQJV1n7k9qNQNZBCEu...](https://ipfs.pics/QmTZo6oPKHwUgWB7p7LfZwZsVQJV1n7k9qNQNZBCEuYSb5)

[2]
[https://ipfs.pics/QmeuJjgV621NV9aNKyNAUoEHdWZYtzCrwkLHoHneg3...](https://ipfs.pics/QmeuJjgV621NV9aNKyNAUoEHdWZYtzCrwkLHoHneg3aSJE)

[3]
[https://ipfs.pics/QmRWcCkBdaG214GKttkGFcadncUJ6YvfMTSE8jiAxA...](https://ipfs.pics/QmRWcCkBdaG214GKttkGFcadncUJ6YvfMTSE8jiAxAtEtk)

bonus picture:
[https://ipfs.pics/QmPkncvs2R9EkhZQuzPzWYs4z7UUdKqQzg1k8mc7y5...](https://ipfs.pics/QmPkncvs2R9EkhZQuzPzWYs4z7UUdKqQzg1k8mc7y5X1pN)

~~~
kyledrake
And if you turn off Cloudflare's protection to fix this, somebody that wants
to censor you and has $20 will use one of the hundreds of DDoS booters (most
of them are behind Cloudflare) to nuke your site, unless you're Brian Krebs
and qualify for Project Shield.

I'm very optimistic about the direction the internet is going in right now.

~~~
kefka
And notice those weird hashed links? IPFS hashes.

As a quick primer, it's a hash that points towards a directory of content.
Everything's deduplicated. Based on Torrents, GIT, and self-referential
filesystems.

an IPFS hash is immutable. The hash points at the hash, no matter what.
Indestructible. Publish stuff by

ipfs add -r folder

An IPNS hash points towards an IPFS hash. It's a pointer you publish every 12
hours. It IS mutable. Do this by

ipfs name publish <ipfs key of resource>

The browser plugins with IPFS running allow you to pull the DNS text record of
the IPNS hash, and you never touch the website!

Example, PageNodes :

[http://ipfs.io/ipns/QmVjH4F65fnqy1GkBBYiuAkdazKzYsw3LbMVANGF...](http://ipfs.io/ipns/QmVjH4F65fnqy1GkBBYiuAkdazKzYsw3LbMVANGFeBGB8e/)
POINTS AT --->
[http://ipfs.io/ipfs/QmbLPfyehFnViKZpU237P6a6DpjCfWFSoDBMQFGU...](http://ipfs.io/ipfs/QmbLPfyehFnViKZpU237P6a6DpjCfWFSoDBMQFGUAgYW2t/)

Tl;Dr. DDoS makes no sense regarding IPFS. Everybody makes the network faster.

~~~
delluminatus
Sorry, but the OP wasn't talking about ipfs at all. He was talking about
Cloudflare.

Having said that, thanks for the interesting digression. You've made me want
to try out ipfs.

~~~
kefka
Heh, in my mind it made perfect sense.

CloudFlare is a anti-DDoS and CDN network. IPFS is a CDN protocol that anyone
can join or put files into. It doesn't quite hide the endpoints, but anyone
can inject data.

It does what CloudFlare does, but better. And as more people/nodes get online,
free and ubiquitous.

------
heavymark
Sounds like the website owner didn't setup up a page rule properly for it's
RSS feeds in CloudFlare. If you flip a switch and don't customize it properly
you will likely run into issues like this, but that's not Cloudfare, rather
the webmaster for the site who you should be contacting.

~~~
smacktoward
Why on earth shouldn't it do the right thing out of the box? Isn't "this stuff
is hard, let us figure it out for you" CloudFlare's entire value proposition?

~~~
koolba
> Why on earth shouldn't it do the right thing out of the box?

Which is what exactly? The default being to have a uniform service across all
asset types seems sensible enough to me. It's either that or everything is off
by default and you manually include paths/expressions that are included. The
latter seems more of a hassle for most people.

> Isn't "this stuff is hard, let us figure it out for you" CloudFlare's entire
> value proposition?

I'm not sure what their angle is. As far as I can tell it's to re-centralize
the internet so they can be the single tap point where SSL is added and
removed[1]. Once they've got a sizeable chunk of the world funneling through
them, they could make some decent coin giving direct access to the feds.

Until then, it's just free SSL and DDOS protection. Can't really complain
against free either. Heck just don't use it.

[1]: [http://www.newyorker.com/wp-content/uploads/2013/11/nsa-
smil...](http://www.newyorker.com/wp-content/uploads/2013/11/nsa-smiley-
face-580.jpg)

~~~
daenney
> Heck just don't use it.

That really is not a viable option when you want to access content which
happens to (now) be behind Cloudflare.

~~~
koolba
> That really is not a viable option when you want to access content which
> happens to (now) be behind Cloudflare.

Sure it does. Vote with your bytes and move along.

------
Navarr
The real solution here is that the blog owner needs to turn off security for
their RSS feed URLs.

Cloudflare could make this easier if it detected that they had an RSS feed and
offered a suggestion, via email or the browser (or both) that was like "Create
a security exception for your RSS feed"

~~~
LeifCarrotson
No, the blog owner still needs protection on the RSS feeds.

If they publish something that offends someone with a botnet, and have an
exception for the RSS feed, the offended someone will just attack the feed to
take down the site.

The real solution is for the feed reader to contain better error handling.

~~~
afandian
You can surely do DDOS protection on a particular URL without a bot detector.

~~~
LeifCarrotson
Sure. For example, throw a 503 on requesting that URL, or present a captcha.
But then feed readers that get caught in the DDOS protection, can't do the
captcha, and don't handle errors will fail to get the feed.

------
cm3
For a year now Google's reCAPTCHA has been very hard to convince as a human
that I selected the correct 3 (or 4) rivers, houses, street numbers,
mountains. Most of the time it makes me take the test twice or or more, even
though reCAPTCHA is wrong. I feel like, if we're used to help train their
neural network, it should at least have the courtesy to not require so many
I-don't-believe-you-please-retry iterations.

Given how many sites use Cloudflare, I'd welcome they switch to a more
reliable captcha service until Google repairs their scripts.

Oh and other captcha services work in Tor Browser Bundle.

~~~
beardog
reCAPTCHA intentionally occasionally calls correct answers wrong in order to
make it harder for bots to learn.

~~~
zwily
If getting the right answer isn't adequate, what's the point?

------
sudhirj
Might be a problem with Cloudflare marketing. I use it for my sites and well,
but I don't think of it as a magic button / service that makes all my problems
go away.

Cloudflare is a hosted reverse proxy service that handles DDOS protection. It
enforcing rules on RSS pages is no different from me putting in a captcha
extension on nginx or Apache and having it run on all pages by mistake.

And this isn't even a default configuration issue. This is simply a mis-
configured service, no different from misconfigured haproxy / nginx / Apache /
Cloudfront.

------
jgrahamc
The author can email me jgc @ cloudflare and I'll be happy to help.

~~~
medecau
Internet scale right here.

~~~
jgrahamc
I was suggesting that the person had a problem with our service and could try
contacting us. The 'story' posted has little detail of any kind so it is hard
to assist.

~~~
mey
The story is about a general practice/design of CloudFlare, not a specific
site. Fixing it for one site or one user ip address won't fix the fundamental
design that CloudFlare expects a human and a web browser behind every request.

~~~
jgrahamc
I am Cloudflare's CTO. I know how our systems operate. I was asking for this
person to contact me so I can understand what is happening in this instance.

~~~
webscaleizfun
The two issues the author brings up are broad issues I run into with
Cloudflare protected sites all the time. The fundamental assumption that
everything using the internet has a full JS engine and a human immediately
ready to solve a reCAPTCHA is flawed, and thinking instances like this are a
one-off is inherently wrong.

Torrent traffic, file transfers, VOIP and all the other non-HTTP type traffic
that Cloudflare just breaks by default make up a good chunk of the traffic you
see on the web. That Cloudflare pitches itself as a one and done solution with
minimal configuration just nakes this worse, since website owners generally
won't bother to set up custom rules for RSS feeds and the like. Additionally,
if even 1% of the RSS feeds that were broken by cloudflare were emailed to
you, your inbox would be flooded.

~~~
jgrahamc
_Torrent traffic, file transfers, VOIP and all the other non-HTTP type traffic
that Cloudflare just breaks by default make up a good chunk of the traffic you
see on the web._

Huh? We don't handle non-HTTP traffic. How can we break it?

~~~
dredmorbius
wget or curl might be examples of breakage.

I'm known to use console/text clients (w3m, lynx, links, elinks[2]) from time
to time. Cloudflare definitely interferes with these.

Not sure about the other examples given.

And, to hijack: I wanted to say thanks for the work on a Tor-friendly
anonymised reputation system. I've commented on that in the past, and need to
take a closer look / see others' thoughts, but definitely appreciate the
effort.

------
bonkabonka
I switched to using [https://github.com/Anorov/cloudflare-
scrape](https://github.com/Anorov/cloudflare-scrape) in my RSS bot because
several of the blogs I follow moved to CloudFlare. It wasn't precisely a drop-
in replacement for Requests, but it wasn't too hard to wire up.

~~~
spikej
That's neat! I've been using
[https://phantomjscloud.com/](https://phantomjscloud.com/) to render and then
use that

------
blfr
_Just reading the blog in my browser is now somewhat hampered because
Cloudflare thinks I’m some sort of cyberterrorist and requires my browser to
run a javascript anti-turing test._

Google does the same thing on Youtube when you're browsing from a "bad
neighbourhood" (OVH). Incredibly annoying and, unlike Cloudflare, redirects
you to youtube.com instead of the video you wanted to watch. The problem keeps
reoccurring despite being logged into a Google account.

Ironically, youtube-dl from the same IP works just fine. So I don't what
they're protecting. Are they trying to prevent automated comment-reading?

~~~
jakobegger
Maybe they want to prevent services that increase play count?

------
problems
CloudFlare has page rules for a reason. It's trivial to configure no anti-bot
protection on your RSS feeds.

In fact, it's possible that the person he was hitting was in "I'm under
attack" mode or similar, which would try to reduce bot hits to the web server
by any means necessary to prevent a layer 7 attack from taking the site
offline.

~~~
alpb
But sounds like it should be the default for this feed/atom feed/xml Content-
Types? Humans do not read RSS feeds.

~~~
problems
They'd have to probe every endpoint to figure out the content-type of the
return.

And if your web application allows queries that produce RSS feeds that could
still result in a really bad L7 attack if you simply were to ignore all feeds.
No caching + randomized queries on a small site would knock it offline in no
time.

~~~
tracker1
I would think that a good caching solution on the server for RSS requests
could perform decently even under DDoS scenarios... though, this is why people
used other services in front of their RSS feeds, so that they were better
cached. Most blogs aren't generation more than a couple new articles a day, so
caching 15-120 minutes wouldn't be an issue for most use cases.

------
parennoob
I'm tired of seeing CAPTCHAs, redirects, those awful Google "I'm not a robot!"
tests, and being forced to fork over my phone number every time I register an
email account (it is no longer possible to get an email account at GMail or
Yahoo without giving them a phone number). It seems like no one just wants to
serve regular web pages _fast_ any more.

In my opinion, anyone who puts this sort of stuff in front of their blog is
overly paranoid (unless they are some sort of high-profile victim, like Brian
Krebs). Just avoid reading their blog.

~~~
neoCrimeLabs
Let me introduce you to the concept of defence in depth:

[https://en.wikipedia.org/wiki/Defense_in_depth_(computing)](https://en.wikipedia.org/wiki/Defense_in_depth_\(computing\))

Really, the longer a website is online, the more it attracts bots. Spam bots,
brute-force login attempting bots, information gathering bots, vulnerability
testing bots, and many more. This isn't even counting targeted attacks.

I've seen many small websites recieve much more bot traffic than user traffic.
Sometimes to a crippling level.

My point is, it's not paranoia if they really are after you - even if it's a
mindless mass of bots.

Like all security though, sometimes you affect legitimate users. This is why
it's great to have multiple ways they can provide feedback.

~~~
tracker1
At [a certain auto website I used to work at] the bot traffic was a
_significant_ amount of the requests... Because of the nature of referal
traffic we were required to let the bot traffic stand...

This was mitigated by implementing a few layers of a decent caching strategy,
as well as some db improvements, and moving search queries to a separate
database server (mongodb, then elasticsearch) altogether.

In the end, there are lot of things you can do to help mitigate these
things... It really just depends on what you are trying to accomplish with a
given site.

------
kev009
Abject failure IMHO. A CDN must be as transparent as possible for both end
users and sites using them. DDOS can and should be dealt with passive
detection.

------
acdha
Cloudflare offers configuration. Users may choose to configure paranoid
security options which are unsuitable for non-browser usage but that's true of
any hosting option.

~~~
omouse
The default configuration is the damned problem.

~~~
Kalium
How do you think it should be different?

------
ryanlol
It's really scary how many of the responses here are "lol the user needs to
change the config". I guess these are the same guys that think that mongo
should bind on 0.0.0.0 by default, the users can change it.

Maybe have working default configs instead? Users are not going to change the
configs unless you make them.

~~~
tracker1
Those wouldn't be the same guys that bind mongo to 0.0.0.0 by default... the
default config in this instance _is_ more secure... which means it's more
annoying in scenarios it isn't configured for...

You have a firewall, and then complain to mongo because you can't access your
database from another computer.

------
ruchit47
May be its bad implementation of cloudflare. Cloudflare doesn't do javascript
check if content type is xml in headers unless you explicitly want it to do.
RSS feeds and similar URLs should be excluded from security with page rules.

------
rcarmo
Hmmm. I've had an intermittent problem where my feed suddenly regurgitates
past items without warning. Wondering if this might have something to do with
it somehow...

------
vog
Wow, seing that level of incompetence happening at a company like Cloudflare
is quite astonishing (i.e. hard to believe) ... and utterly disappointing.

~~~
ShakataGaNai
Who said it is incompetent? Why is it incompetence? It's possible someone has
their cloudflare security turned up to 11. Maybe they have hacker problems and
would rather risk a few less clicks than getting completed p0wned.

Also keep in mind that just because it's "RSS" doesn't mean there is a quick
and easy way to exclude it from security. On your average Wordpress blog the
URL is /somethingradom/feed/. So either Cloudflare assumes every URL of /feed/
should be exempt, or it should read the contents of every page to check for
exclusions?

Also do keep in mind that if you're a Cloudflare customer you can easily
exclude specific URLS from this type of security scrutiny. So perhaps the blog
owner is incompetent? Perhaps this person posting this is on a network with a
computer thats infected with a botnet.

Who knows. This "article" is shit.

~~~
zzzcpan
Here's the thing: most website owners don't know any better, it's all up to
their CDN providers, like Cloudflare, to provide adequate features for
security preconfigurations. Don't make it a user's responsibility. At the end
of the day I'm annoyed by Cloudflare, forcing me to enable javascript, not by
any of their customers.

~~~
predakanga
When the website owner decides to use Cloudflare in front of their site, it is
absolutely their responsibility to ensure that it's configured properly. Not
Cloudflare's and certainly not the user's.

And for that matter, it's entirely possible that they consider this correct
behaviour. They may have a poorly written dynamic RSS feed that doesn't cache
for instance, and _want_ it protected.

Moreover, Cloudflare provides plenty of ways for the website owner to avoid
this behaviour - the global security level can be adjusted, settings for
individual endpoints can be adjusted, even settings for individual IP ranges
can be adjusted.

As a user, you have no idea what settings the website owner has selected; it
seems rash to blame Cloudflare in that context.

~~~
zzzcpan
There is no one else to blame. Website owners are just people, most will never
be able to understand the service and all of the implications of the
configurations it provides. Cloudflare is the only one here able to do
something about it, not their customers or users.

I don't even know why this is a discussion. Not blaming customers and users
for anything is a common sense.

------
cocktailpeanuts
I wonder how many people still use RSS to subscribe to sites, even the HN
crowd.

Personally I stopped using "traditional" RSS readers years ago, and have
stopped using more "modern" RSS readers like Flipboard or Feedly last couple
of years, since they're just a subset of what I can find on Twitter.

Most people who own a blog have a Twitter and they share all their posts on
their feed anyway, so I just follow them on Twitter and use Twitter as "RSS
reader".

Before you open web standards advocates throw rocks at me, I am not an open
standards hater either, I have built a couple of RSS related apps as projects
in the past and still believe that's the way to go in the long term, but in
2016 I can't really find a reason to still use a RSS reader.

In fact the scenarios like what OP mentioned in the article is exactly why I
would rather use Twitter. Why deprive of yourself of an opportunity to read
content from someone you like, when you have a totally free option?

~~~
pavel_lishin
> _I just follow them on Twitter and use Twitter as "RSS reader"._

This quickly breaks down. I subscribe to 1096 sites according to Newsblur. I
would 100% miss some content from those sites if I only followed them on
twitter.

And for some of those sites, that's probably fine; a fair amount are just
'entertainment'. But there are updates I would hate to miss, even if they
tweeted links to them at 3am or right before a huge tweet-storm from someone
else.

That is _exactly_ the problem RSS was designed to solve.

~~~
michilehr
Or when you were on vacation taking a brake from work and so also from the
tech stuff. RSS is great for not missing something. All that interest based
sorting for social media made RSS much more important for me.

