Hacker News new | past | comments | ask | show | jobs | submit login
Cloudflare is turning off the internet for me (dijit.sh)
454 points by dijit 32 days ago | hide | past | web | favorite | 310 comments



It appears that you may have made some modifications to your user agent string. If you revert your user agent to the one provided by default by your browser vendor everything will be fine.


Why is it that something as malleable as a user-agent string trips these kinds of sensors?

If I were to write a bot, copying current browsers' user-agents is literally the first thing I'd do


It really hits home the point of how shitty the web has become. Ad companies and malware distributors come up with bad and worse ways to interfere with my browsing, and the “good guys” need to match with increasingly invasive and fragile anti measures.

Sort of like having to take of your shoes when you board a plane. If that’s what it takes, isn’t it just better to stay home?


> Sort of like having to take of your shoes when you board a plane. If that’s what it takes, isn’t it just better to stay home?

Removal of shoes, 'naked' full body scanners, these are all terrible, and I tell myself every time it isn't worth the hassle.

The reality is that as much as I hate it, I'm still flying every other week.

I'm also on the Internet daily. I don't see that changing.


for me it did change. I stopped flying and I stopped visiting websites which won't accept my tracking blockers.


Honestly what is the point of user-agent at all if it needs to be set to some changing, magical incantation in order for a browser (or any other agent) to be functional?

I hate the direction the internet and tech is going, and I hate even more that I'm seemingly powerless to do anything about it


You need it for work. Meanwhile, TSA has caused more overall economic damage than the 9/11 plane crashes.


Welcome to this brave new world where technology is accessible to all.

I hate it.


Indeed - I travel by train as much as I can.

The web sucks. Society/civilization is shaking in its foundations.

I just wish the passive non-violent approach would work. It worked for Gandhi, but in this day?

I feel we're all getting overrun by technology. Unfortunately, as it could have been the opposite.


You'd be surprised. The /good/ bots do this, but there's a lot of white noise garbage that simple techniques do still filter out.


Speaking from experience: there are a ton of bots that don't set UA and use whatever their request library sets.


Pure conjecture: The "security solution" probably wanted to ban the user for a reason unrelated to the UA string, and was only able to (i.e., the user was only identifiable uniquely enough) because of the odd UA string. Switching to the standard UA string places the user into a state sufficiently non-unique as to be unidentifiable and thus unblockable.


If omit the user-agent string or, even better, the user-agent header itself, everything will be fine, too.

Tested with Cloudflare and many, many other servers over many years.

On the whole, taking the entire web into account, it is rare for a user-agent string to be required.

However, it has become common for servers to make many assumptions based on user-agent strings.

I would guess there are many tech workers whose entire job rests on the assumption that user-agent strings are always present, rarely manipulated^1 and accurately represent the user's hardware and software.

1. For example, changed using "Developer Tools" in the major browsers. Google's browser has some user-agent presets for "testing" in DevTools (Ctrl-Shift I, Ctrl-Shift P, Drawer Show Network Conditions). Those should be safe to use for logins to Google websites. Try them out, e.g., when logging into Gmail and watch how the user can request vastly different web page styles based only on user-agent string.


There are a number of sites that simply crash with a web framework backtrace or behave strangely when the User-Agent header is not sent.


That sounds like something worth reporting if possible, assuming it's also written to a log it might be a denial of service week point.


It appears that setting it to the same as Chromes does indeed work!

for context this is what I had set (and, for quite some time it was working): "Mozilla/5.0 (X11; Linux x86_64; rv:57.0) Gecho/20100101 Firefox/57.0"

Ironically I set this so that I could continue logging in to google. Since I had been unable to log in to google-apps without setting this user agent string.

What did it fail on? the mis-spelling of "Gecho"?


It's the severely-outdated Firefox version number. Spambots and crawlers sometimes have user-agent strings corresponding to very old browsers, because they were set once when the bot was created and then never updated. On an unrelated site that I run, we get a lot of traffic with user agent strings corresponding to implausibly-old browsers, and it's ~100% bots.


November 2017 is “severely outdated”?

https://www.mozilla.org/en-US/firefox/57.0/releasenotes/


Two full years for an evergreen web browser, which contains probably the largest surface area for software exploits of anything on the machine? I’d argue absolutely yes.

As others have echoed, this is probably a huge marker for malicious bots to Cloudflare.


The evergreen browser is a thing, but the idea that everyone can trivially upgrade those browsers is promulgated as true when it's a bit of a myth.

It is sometimes expensive for people to upgrade browsers, called evergreen by developers so they can avoid annoying support expenses for a few percent of people.

I had a phone running a Mozilla browser, which received updates until it didn't any more.

Then the only way to upgrade browser was to purchase a new smartphone.

Unfortunately it was a superb device with no newer replacement, so to upgrade browser I had to downgrade my smartphone for other uses, and pay the cost of an expensive new smartphone despite not really wanting one. But sites saw it as "you are running an old Firefox, you obviously can trivially upgrade".

I still have a perfectly great old Android tablet running an old version of Chrome which cannot be updated. Other than website compatibility, everything on it that it is used for is still working flawlessly. Perfect screen, sound, wifi, memory, battery.

For now, enough sites work on it that I still use it. That can be replaced easily with another tablet, but it is disappointing to have to spend cash and throw away a working product to e-waste, just to replace it with a functionally identical device because of the way the software treadmill works. (It doesn't have to work like that, it's a choice made by developers collectively.)


Yes, plus one of FF's upgrade slipped in the change that ignored your setting on "allow unsigned extensions" which broke a vital UX app I had been maintaining after it got abandoned (pentadactyl: I had gotten so used to clicking links from the keyboard that it was really frustrating when I suddenly couldn't; fortunately there have been similar projects since that carried the torch).

I mean, they said they gave long notice for the change, but I didn't think that a browser that "empowered users" and "gave them control of their machines" would ever do that. I mean, if every change has to be approved by Mozilla, why not just shrink wrap the browser and make me get it from Microsoft at Best Buy?

https://www.youtube.com/watch?v=taGARf8K5J8


Even a month of no updates to browser is a bad idea.


Between the huge and complex attack surface and being exposed to a huge number of untrusted websites, running a browser without security updates is pretty risky. So I'd call any unsupported browser "severely outdated".

Long term support (ESR) Firefox releases are supported for about 15 months from release. And even that means using a major version that old, not a point version that old. Firefox 57 wasn't even an ESR, so it went out of support a couple of months after release.


Most certainly, like any complex app that needs to interact with potentially hostile services.


I always recommend setting custom user-agents for a problematic page instead of setting them globally.

For the Google issue, qutebrowser v1.9.0 does that already, see https://github.com/qutebrowser/qutebrowser/issues/5182


So can do Falkon :)

Having a Chrome UA is a MUST on webkit based browsers if you want Google's taxing services such as Earth/Maps/Gmail and so on being faster and smoother than ever. Seriously.

Once you open Street View on luakit/vimb with a Chrome UA, the diff is night and day.


I tried with Street View im vimb. I don't see any difference - it's slow to the point of being almost unusable, while it works fine in qutebrowser.


on ~/.config/vimb/config

    set hardware-aceleration-policy=always

    set webgl=true


That seems to help, but I still don't see a difference with/without a Chrome UA.


Try a mobile Chrome UA, such as the one for a recent Galaxy Tab.


That User-Agent won't trigger the block page you were experiencing.

No clue about the issues with Google, perhaps some feature detection going on?


Nope, it's Google trying to ban "embedded browser frameworks" - see https://github.com/qutebrowser/qutebrowser/issues/5182 for details.


I used straight firefox and was still banned just fine. It didn't start in 2019 either. Chrome is their cash cow, if you don't use it, you're a liability.



I find it very annoying that the authors thought it would be cute to use another full name for MITM.


My wild unfounded guess: they’re trying to make it gender-neutral.


Pretty much. Link to probably the first article I saw using it: https://news.ycombinator.com/item?id=20673409

> It’s the same thing, recognizing that the MITM is neither male, nor human at all.

I don't see why this is important for a technical term. People hear the term as a slug, a group of words, not as discrete ones. No one actually pictures a man or anything else in the middle upon hearing the term. The difference is that the purpose of language is to communicate with others, and everyone understands man in the middle. I look up the "alternative" and get more results for "Henry the Hugglemonster" than I do for network traffic interception.


> No one actually pictures a man or anything else in the middle upon hearing the term.

Thanks, I’ve always wanted someone to mansplain to me how I hear terms and what I picture while I hear them.


I can see both sides of the argument here, but don't really have an opinion. Perhaps if I weren't a middle-aged, middle-class, white male in a Western country, I'd feel more strongly about it. As it is, I do feel a bit of "social justice fatigue" on issues like this.


I set privacy.resistFingerprinting to true in all Firefoxes I use. This also sets the user agent to something common.


At least now people can see why Google want to deprecate the User-Agent string.


The problem presented by services like ReCaptcha and Cloudflare is a tough nut to crack.

They're silently embedded in a huge portion of modern websites, and the average user will never even know about them.

But it seems to be way too easy for them to blanket-ban or serve an absurd amount of captchas to powerusers, linux gurus, privacy geeks, or anyone with the wrong combination of browser+addons. And the failures (as in this case) are often silent, cryptic, un-fixable from the user end, and can prevent us from accessing massive swaths of the internet. Any thoughts surrounding this conundrum?

Solutions:

1. Everyone stops using ReCaptcha/Cloudflare.

- Never going to happen. They dominate the market because they are useful, well-made services.

2. Launch a competing product that accomplishes the same thing.

- Good luck competing with these giants. Also, how would your implementation differ to solve this issue?

3. Powerusers and tech nerds must conform to 'normal' browser configurations and disable privacy addons in order to enjoy the internet with 'normal' users.

- Two steps backwards in every conceivable way. The giants gain more invisible power and powerusers suffer decreased productivity/privacy. Not going to happen.


Yeah, you can't really talk about downsides of Recaptcha/Cloudflare without also acknowledging the extreme amount of malicious actors and abuse on the internet.

We're in a "this is why we can't have nice things" predicament and you have malicious actors to thank for that, yet most people on HN only seem capable of attacking the few affordable solutions to that problem.

I'm even down with the theory that Cloudflare is a US government outfit, that's the only way I can wrap my head around such a generous free tier. But at what point does it worry you that the internet has so many fundamental issues that people willingly centralize behind such a large behemoth? How many options do I have when a kid is holding my forum hostage with a $5 booter service?

It's easy to shit on everything. Let's hear some real solutions.


> Let's hear some real solutions.

It's by no means a full solution (there likely is no single full solution), and it may even be a bad solution -- but lately I've been trying to think about what the Internet would look like if we didn't have a massive arbitrage potential around server requests.

Part of the reason why everyone is trying to detect bots is because bots will very, very rapidly eat up your bandwidth and CPU time. We're used to offering our bandwidth/CPU for free to humans and either swallowing the cost if we're running a free service, or making up the cost in an adjacent way (ads, subscriptions, etc...). It's not bots that are the problem. It's that when someone asks our servers to do something, we do it for free. Bots are just a big category we can ban to make that problem smaller.

In many (but not all) cases, we shouldn't care about bots, and the only reason we do is because our systems aren't scalable to that level.

So I've been wondering lately what a server-defined per-pageload, or even per-request fee would look like on the Internet, maybe one that scaled as traffic got heavier or lighter and that was backed by a payment system that wasn't a complete rubbish privacy-disrespecting dumpster fire.

My immediate thought is, "well, everything would be expensive and inaccessible." But, the costs don't change. You still have to pay server costs today. Businesses today still need to make that money somehow. There are almost certainly downsides (all our current payment systems are horrible), but I wonder if it's more or less efficient overall to just be upfront about costs.

Imagine if I could put up a blog on a cloud service anywhere with scalable infrastructure. Then a post goes temporarily viral. Imagine if my server could detect it was under heavy load, detect that it was getting hit by bad actors, automatically increase the prices of requests by a fraction of a cent to compensate, and then automatically ask my provider to scale up my resources without costing me any extra money?

For a static site, suddenly I don't need to care if people or bots are hammering it, I don't need to care about anything except whether each visitor/bot is paying for the tiny amount of hosting costs they're hoisting on me. If bad actors start pushing traffic my way, I don't need to ban them. I just force them to pay for themselves.


> automatically increase the prices of requests by a fraction of a cent to compensate

Great concept.

CPU, bandwidth, electricity, it's all just energy. And to a significant degree, money is just energy stored. I generate energy with my own work, store it in the form of money, and then transfer that energy to someone else, maybe to heat my home or cook me a meal.

Before money, I had to barter for those things. Maybe conceptually the internet is in a similar state at the moment. It doesn't have 'money'. Why can't I put CPUs in my wallet and then spend them? And why can't I charge visitors to my site by the CPUs they are costing me?

Instead, I have to, in a way, barter. For example, maybe I use ad revenue to earn my income, so I generate all this content, I barter that to the search engines, which barter with the advertisers, which barter with me, and I barter back to security guards to protect me from 'bad' actor bots. I'd really just like to receive CPU and bandwidth payments from them.


Isn't the reason we are freed from barter in daily life is because the government is intimately involved in the financial/banking system, and regulates it and issues money and so on? Maybe we continue to struggle with the internet because it started out unregulated and has never really transcended that because people insist on thinking freedom is best for commerce without appreciating the nuances.


There are alternatives to that. For all of the hype and vaporware of the cryptocurrency movement, the idea of digital-native programmable internet money is a powerful one. I’m personally excited by the idea of involving currency at the protocol level and having it interact naturally over tcp/ip and http. There is an alternative to ads if we can make it work.


> Before money, I had to barter for those things

Not at all. Barter was quite uncommon also unpractical. Most societies used (and use) social connections and trust.


That's how ads work. More visitors more pageviews/clicks. People who serve ads don't want to pay for bots which is why they are a problem.

Doesn't medium do this?


> That's how ads work. More visitors more pageviews/clicks.

That's not asking people to pay for bandwidth/compute power, it's selling something adjacent to your content that you hope makes up for the loss.

> People who serve ads don't want to pay for bots which is why they are a problem.

That's kind of my point. When you ignore the arbitrage potential of serving requests for free, it forces you to care about making sure that your content is only available to the "right" users. You have to care about things like scraping/bots, because you're not directly covering your server costs, you're swallowing your server costs and just hoping that ads make up the difference.

Theoretically, in a world where server costs were directly transferred to the people accumulating those costs, you wouldn't need to care about bots. In fact, in that world, you shouldn't care whether or not I'm using an automated browser, since digital resources aren't limited by physical constraints.

In most cases, the only practical limit to how many people can visit a website is the hardware/cost associated with running it. A website isn't like an iPhone where we can run out of physical units to sell. So if they're paying for the resources they use, who cares if bots make a substantial portion of your traffic?

> Doesn't medium do this?

No, Medium just sells subscriptions, you don't pay for server usage. As far as I know, no one does this -- probably in part because of problems I haven't thought of, also probably in part because there are no good micro-payment systems online (and arguably no really good payment systems at all).

The closest real-world example is probably AWS, where customers pay directly for the resources they use. But those costs aren't then directly passed onto the user.


If you had to pay for each request you would make fewer and limit the requests to serious requests (school, work, medical).

Having said that, you could provide a central service where people would buy credit to be used on many sites. So the micropay isn't the problem.


> you could provide a central service where people would buy credit to be used on many sites.

That central service is going to lock out many countries and regions as well as lots of people (minor, unbanked, poor, etc.) in non-locked out countries and regions. Payment is frigging hard especially on the international scale. This is every bit against freedom of information and strictly worse than Cloudflare.


> Part of the reason why everyone is trying to detect bots is because bots will very, very rapidly eat up your bandwidth and CPU time.

It is?

Thought bot detection was only done during registration etc. to stop them from sending spam etc. to real users.

If anything the javascript world we live in helps combat this. You need insane resources on the client just to have a page open. Several orders of magnitude more than the server need to generate and send that page.


In that case, an IP or IP block throttling is good enough.

Except then there are those pesky CGNATs to handle including Chinese Great Wall.

Anyway, high profile spammers will emulate enough of the browser to render any measure based on browser anomaly detection worthless. Including using a headless browser. The only way to defeat them would be too put some quite computationally intensive JS operation... (On par with mining, ruining all the laptops, phones and tablets. But you can make it not trigger every time.) This would make spamming expensive.

Server-side we have excellent AI spam filters that nobody seems to be using to fire off a captcha check later. The big problem here is that you cannot offload to some provider without inviting big privacy concerns. (Same problem as forum/chat/discussion platform providers.)


high profile spammers will emulate enough of the browser to render any measure based on browser anomaly detection worthless

Based on actual experience of fighting spammers, that isn't the case. Like a lot of people new to spam fighting you're making assumptions about the adversaries that aren't valid.


There are many different types of spammers and attackers.

Some will be stopped by the simplest protection mechanisms.

Some will be indistinguishable from real humans, and you won’t be able to stop them without crippling your services for your real users.

But those are the two extremes. The real problem is the ones between those extremes.

Every intentional stumbling block you put in the path to try and stop those in the middle might also have a negative impact on your real users. The real problem is that the most troublesome attackers will learn and adapt to whatever stumbling blocks you put in the path. So, how many of your own toes are you willing to sacrifice with your foot guns in the name of stopping the attackers?


Very few, but that's OK. Good spam fighters don't have to sacrifice many or really any toes to stop nearly all spam. You seem to be assuming a linear relationship between effort and false positives, but that would be a very ineffective spam fighting team relative to the ones I've worked on. In practice you can have nearly no false positives combined with nearly no false negatives.

This isn't easy and many firms fail at it, but you it can be done and we routinely did it.


No. Botnets are large and broadly distributed enough to render protection methods based only on the IP or IP block ineffective. They're commonly used for mailbombing attacks such as those described here: https://www.wired.com/story/how-journalists-fought-back-agai...

Do you think a botnet with 10k machines is going to be meaningfully inhibited by making each machine's cpu run calculations for a second or two for each submission?

I'm sure reCAPTCHA looks at the IP and IP block as one of the inputs to its ML algorithm, but as one or two of perhaps a dozen different features - including mouse movement and/or keyboard input, which is quite a bit harder to fake.


IP block for registration?

Seems highly unrealistic.


I used to work on spam fighting.

This sort of solution is frequently proposed but doesn't work, because:

• Serving costs are rarely the problem. Normally it's annoying actions taken by spammers and the bad reaction of valuable users that matters, not the machine cost of serving them.

There are occasional exceptions. Web search engines ban bots because left unchecked they can consume vast CPU resources but never click ads. However, they only get so much bot traffic because of SEO scraping. Most sites don't have an equivalent problem.

• There is no payment system that can do what you want. All attempts at creating one have failed for various hard reasons.

• You would lose all your users. From a user's perspective I want to access free content. I don't want to make micropayments for it, I especially don't want surge pricing that appears unrelated to content. Sites that use more typical spam fighting techniques to fend off DDoS attacks or useless bot traffic can vend their content to human users for free, well enough that only Linux users doing weird stuff get excluded (hint: this is a tiny sliver of traffic, not even a percentage of traffic but more like an occasional nuisance).

• You would kill off search engine competition. Because you benefit from crawlers, you'd zero rate "good" web bots using some whitelist. Now to make a new search engine I have to pay vast sums in bot fees whilst my rich competitors pay nothing. This makes an already difficult task financially insurmountable.

The current approach of using lots of heuristics, JavaScript probes and various other undocumented/obscure tricks works well. Cases like this one are rare, caused by users doing weird stuff like committing protocol violations and such users can typically escalate and get attention from the right operators quickly. There are few reasons to create a vast new infrastructure.


> most people on HN only seem capable of attacking the few affordable solutions to that problem.

I doubt that many would attack those solutions if they actually worked well, but they don't. These "solutions" are a big part of the reason why the web gets smaller for me every day as more and more websites become unusable.


Cloudflare is like the TSA for the internet, I'm not convinced it needs to be as aggressive as it is. And yes I know websites have some control over how aggressive it will be but much like Reddit-moderation policy it's choosing the safety over everything approach, which hits enough false-positives on the edges to be a serious problem.

Cloudflare is very much anti-internet. And I'm a very security-obsessed person. Just like Reddit I believe we need to dial things a bit closer back towards chaos like a venn-diagram (safety)[x](chaos) there's a balance and I believe the internet is worse off when this balance is out of wack.

There might be some awful stuff on sites like 4chan but it also generated a ton of the memes that later filtered down into mainstream internet culture. Culture and innovation often happens in the chaos and fringes, which is an area I believe the world is becoming completely intolerant of in some attempt at idealism. But there are real sacrifices in between (ie, the mostly harmless stuff getting tagged as bad guys).

We need to be better at calming down and embracing the chaos, pushing back against FUD, and maintain a good balanced default. That chaos and flexibility is what originally made the internet great and endlessly promising.

Based on the various posts I've seen from Cloudflare founders on here I'm not convinced they are taking this problem as seriously as they need to be.


A comparison to TSA is flawed. Captcha is not a pass fail system, it is a score that is passed on to the web host and they decide what to do with it. Really any similar product to block malicious users would have the same problems, and the solution is to educate the website operators so they can avoid blocking legitimate users.


They do work well for the vast majority of people. Only on HN do I ever see people complaining about cloudflare/captchas/etc.


I don't dispute that they work well for the majority of people -- but the majority of people are not security-conscious.

However, I see people complaining about Cloudflare in lots of places other than here. The number of people adversely affected by Cloudflare is not small.


I see lots of complaints about Captchas in the “real” world, too. Not regarding the centralisation etc. aspect but more regarding how painful they are to complete correctly, but there are definitely complaints.

Regarding Cloudflare, a regular user will have no idea about what Cloudflare is and what they do. If something like the OP happens to them, they will just figure “the site is broken” and move on. So there could be a large hidden number of users who have suffered from overzealous Cloudflare blocking without being able to identify it as such.


> It's easy to shit on everything. Let's hear some real solutions.

My solution more and more is to just not bother with it. If a site is unreadable because I'm using uBlock and uMatrix, and I have to spend more than a minute or two tweaking things, then I just leave.

That said, I don't have any problem with Cloudflare. I'm much more annoyed by the overuse of *.googleapis.com. I'd love if somebody would setup a service that I could point my hosts file at so that googleapis.com silently went somewhere else.


I've been thinking about a local proxy that caches CDN assets. The first request to a cdn URL goes through, subsequent requests come from cache.

I think it would work fine with versioned libraries, fonts, etc. I'm thinking of setting up a container and squid config to achieve this.

Any obvious problems or alternative solutions?

Obviously enumerating the worlds CDN URLs would be a task. But I think even covering the most common CDNs would be a benefit.


I mean your browser basically already does this by utilizing cache-control and expires headers, which all CDN's are going to set


Something like https://decentraleyes.org/?


uMatrix is great for blocking 3rd party stuff globally in your browser. Outside of the browser, I rely on DNS blocking rather than modifying hosts files.

Wrote a little post about how I configured my blacklists and whitelists with AdGuard Pro for iOS.

https://www.calebyers.com/blog/dns-ad-blocking.html


Cloudflare is publicly traded under the symbol NET and has quite a number of institutional investors. A list can be found here: https://old.nasdaq.com/symbol/net/institutional-holdings

If all those companies are fronts for various parts of the US intelligence community then we're really screwed, I suppose.


That's... not how that would work.


The problem is that the narrative has been poisoned by Cloudflare and Google (for Recaptcha) - they both overstate the size of the problem, as well as the effectiveness of their solution.

In other words: when someone demands "real solutions", they're typically expecting a degree of solution that quite likely just does not exist at all, to solve a problem that isn't as severe as people believe, just because that's the bar that those companies have set in the public discourse.

This makes it impossible for well-intentioned people to 'compete' with these services, because whatever alternative is suggested (hidden form elements, a random VPS provider with DDoS mitigation, serving assets locally, etc.) is immediately dismissed as "that can't possibly be as effective / effective enough", even though it'd be perfectly adequate for the vast majority of cases.

The alternative and competitive solutions exist, and have existed for a long time. You don't need a 1:1 replacement for these services. People just often refuse to believe that the simple alternatives work, and won't even bother trying.

(For completeness, my background is that of having run several sites dealing with user-submitted content, including some very abuse-attracting ones.)


>because whatever alternative is suggested (hidden form elements, a random VPS provider with DDoS mitigation, serving assets locally, etc.) is immediately dismissed as "that can't possibly be as effective / effective enough", even though it'd be perfectly adequate for the vast majority of cases.

They are immediately dismissed because I don't want to pay a fulltime engineer to play cat and mouse with skiddies on the internet.


I think you're confusing what you wish was true with what is actually true. For instance, here was a post from a few weeks ago about how one annoyed user was able to take down a Mastodon instance until the admin gave up and put it behind CF: https://news.ycombinator.com/item?id=21719793. Bear in mind, if you're running a Mastodon instance, you're probably well-aware of the downsides of centralization and would only give in as a last resort.

CF has problems, but pretending it isn't solving a real issue that is nearly impossible to fix otherwise, especially for individual admins running a side project, doesn't help anybody.


> I think you're confusing what you wish was true with what is actually true.

And you are cherry-picking poorly sourced anecdotes to better suite your position.

A VPS with 100Mbps virtual adapter physically can't withstand DoS from single attacker with fiber connection (or equivalent of it). This does not have much to do with anatomy of DoS attacks, just simple math.

Cloudflare subsidizes their free users by giving a bit of bandwidth for free — the amount, that can be purchased from a decent hoster for several hundreds dollars. Of course, an attacker with several hundreds dollars can easily rent a botnet, that will demolish that "protection".


Huh?

"All Cloudflare plans offer unlimited and unmetered mitigation of distributed denial-of-service (DDoS) attacks, regardless of the size of the attack, at no extra cost."

https://www.cloudflare.com/ddos/

Do you know of an example of an attacker "easily demolishing" Cloudflare's free DDoS protection for a website with a few hundred dollars worth of botnet?


> Do you know of an example of an attacker "easily demolishing" Cloudflare's free DDoS protection

I can name dozens of websites, that folded under Cloudflare's supposedly flawless DDoS protection (at the time when they were still using it). Of course, the ones who fold are always websites themselves — Cloudflare itself is never affected, because when the DDoS gets particularly bad, they just detach websites from their CDN and expose it to attackers.


...so name them?


If I care deeply about my site staying up, a solution that’s perfectly adequate for the vast majority of cases isn’t sufficient. I don’t want to end up in the mirror image of the original author’s situation, where my site randomly falls down and I have no way to figure out what’s wrong or fix it.


> Yeah, you can't really talk about downsides of Recaptcha/Cloudflare without also acknowledging the extreme amount of malicious actors and abuse on the internet.

What percentage of traffic on the long tail of 95% of smallest websites served by CF is malicious then? So that we talk in numbers.


I have run a number of small and medium websites (20 users per month up to 2 million). At least 50% of the traffic I see in my logs includes some sql injection or other mass script kiddie bs.


That's fairly black and white. Blocking an unusual user-agent because you "think" it might be malicious is another thing.


It might be a poor business decision, but probably not for the reason most people would think.

An unusual UA is unlikely to move the needle on top line metrics, but it is a distraction and a misuse of resources to play cat and mouse. (Unless your business would be materially harmed by someone scraping your data... in which case, you’re doomed anyway.)


I've looked at my logs, and obvious nonsense like POST or GETs with any search params on a website that only has static html pages wich should not generate these kinds of requests is about 1% of last 25000 requests.


Why care about such traffic? Blocking it seems like a pointless exercise.


I was responding to OPs question. Iirc, we discussed it and never implemented any blocking.


> Let's hear some real solutions

The old recaptcha which did not need js, did not serve you with unsolvable challenges, and did not refuse to serve you because you used tor/because you used the audio challenge too much.


Somewhat of a topic hijack and a naive question, but assuming Cloudflare is a government entity, wouldn't they still have to comply with whatever their terms of service / contracts with their users are? As they are a US company, barring illegality, theoretically they can't actually do shady shit without being in breach of contract right? They would also open themselves up to shareholder lawsuits.


If they were an actual part of government, sovereign immunity would be something that would have to be considered. In a nutshell, the government cannot be sued unless it decides to allow it.

The government has passed laws to allow itself to be sued under certain circumstances. The Federal Tort Claims Act (FTCA), for example, allows suits for a variety of torts.

I believe (but am not actually sure) that most normal business-type transactions with the government are covered under FTCA or other acts, so a breach of contract by Cloudflare-the-government-entity would probably be pretty much like a breach by any random non-government entity.

Still, if you were going to depend on that it would be a good idea to actually look into the details of the FTCA and other such acts and compare to the actual Cloudflare TOS.

I have no idea whatsoever how sovereign immunity works in the case of a corporation chartered under some state's corporate law (Delaware in the care of Cloudflare) that is owned (fully or in part) by the government. I'd guess that it could only possibly apply if the government owns enough of the company to have control.

Cloudflare is public, so we can probably not worry about that scenario. If the government actually controls them, it is doing it surreptitiously, and so even if sovereign immunity should be somehow applicable I'd expect that the government would not bring it up because doing so would necessarily bring to light their control.


For caching: Learn how to code. If your web page dies when there are only two visitors, then that's on you.

DDoS attack: If possible, the easiest solution is to just swallow the traffic. If that doesn't work you want to block all networks that allows IP spoofing. Then it's a wack-a-mole game. And if you have the resources, use any-cast and many co-locations. Or ask your ISP for help.

Hiding your server: Use onion address via TOR network.

SSL certificate: Use Letsencrypt

Edge SSL/DNS/CDN: Use a fast web server or proxy, like Nginx. With Cloudflare the connection to the Edge server might be faster, but time to first byte (on your site) often slower. So you get better bang for the buck by optimizing on your end.

Note that DNS by itself already have edge caching out of the box, for free! eg. if a user looks up your domain, it will be cached both at their ISP and LAN. So you don't need Cloudflare for DNS.


> Yeah, you can't really talk about downsides of Recaptcha/Cloudflare without also acknowledging the extreme amount of malicious actors and abuse on the internet.

But recaptcha bas been broken for years now by several different means. At this point, it is so broken it's almost a scam (and just another way for Google to get personal data from as many website as they can).


> How are you going to talk about the downsides of Recaptcha/Cloudflare without also acknowledging the extreme amount of malicious actors and abuse on the internet?

This is acknowledged by the original question

>> Also, how would your implementation differ to solve this issue?


To be clear, I agree with their comment and tried to double down on their point that it's a tough nut to crack. I improved my opening line.


What ever happened to proof of work protocols? I remember in the 00's they were being touted as The Solution™ to our spam/bot woes. Are botnets just so large that even PoW doesn't significantly affect them?


> malicious actors and abuse

It's hard to consider simply viewing content to be malicious or abusive, no matter how automated.


I used to work for a data-scraping firm and very often we would accidentally knock many web sites offline when we pointed our crawlers at them.

I'd love to agree with you, but the crawler problem is 100x worse today than it was a decade ago


This would be much better solved with IP-based rate limits. And if IP-based doesn't work, then you're dealing with a DDOS, and it doesn't sound like this case was DDOS protection.


IP-based rate limiting is easily foiled via proxies, VPN services, tor, or botnets


And user agent string based protections are even more easily foiled, that's why I don't believe this can be plausibly counted under DDoS protection.


> Yeah, you can't really talk about downsides of Recaptcha/Cloudflare without also acknowledging the extreme amount of malicious actors and abuse on the internet.

Cloudflare have a long history of supporting those malicious actors, so it's not like the problem is unrelated to the purported solution.


> Two steps backwards in every conceivable way. The giants gain more invisible power and powerusers suffer decreased productivity/privacy. Not going to happen.

I agree with the first two sentences, but disagree with the third. I believe that this state is actually the intended end goal.

Previously, for many years, I browsed the web with Javascript disabled. At the time, this had very little impact on my browsing experience; perhaps some of the layout would be broken, but not in a way that would interfere with the functionality or content of the site.

Nowadays, not only is this totally impossible, blocking even a subset of a site's JS (such as through uMatrix) is trial and error to get the site to load at all, or to do simple tasks like click on a "login" button.

With Google's plan to "phase out" cookies [0], I expect the web to become even more opaque and difficult to modify "on the fly" -- that is, on the user's local machine prior to displaying the content. In particular, this will affect ad and tracker blocking the most, as the pain from effective ad blockers starts to bite harder and harder.

So, when you say "The giants gain more invisible power", that is true and desireable from their perspective, and since they write the code that actually underpins most web browsers, why wouldn't they?

When you say, "powerusers suffer decreates productivity/privacy", yes, that's absolutely true. Why would they care? It's such a small fraction of their business. Some users will go to more and more extremes to preserve their privacy, eventually accessing only some small subset of sites from an esoteric Kali-derived distro, and others will capitulate and shift their behavior back to the herd.

In the end, the giants still win.

[0]: https://www.inc.com/jason-aten/google-says-chrome-will-end-s...


Like always we find another venue where the giants haven't their foot yet. Or the malicous users does not find a huge interest in.

Rebooting the web into something else is still possible and something that will eventually happen when enough people are too tired about the current state.

This battle has been lost long time ago.


Great points, I absolutely agree with everything you've said. The 'not going to happen' is less a prediction of the future and more a reflection of my personal stubbornness/frustration regarding the direction things are headed (increased 'opacity' as you put it).


I have javascript disabled by default, most of internet works fine.


4. Power users and nerds need to upstream the privacy improvements to everyone.

- To do so you need to avoid making user experience worse


Apple's work on tracking protection in Safari are a great step for this. Normalizing ITP across the whole Mac / iOS / iPad OS userbase means that sites have to accept it or block a huge number of normal users.


This is an interesting subject indeed. Even though both of them are in the "try to grab as big portion of internet traffic as possible" business I wouldn't compare them that easily.

Cloudflare is actually rather good in doing what they do (DDoS side of things). They rarely break normal internet use, the only time when that happens is when a site is put into the "I'm under attack" mode that forces browser to do javascript proof. They do get huge amounts of traffic information though, but that is pretty much required to their core business (DDoS prevention, not the tinfoil kind)

Google/ReCaptcha is another thing. I have hard time understanding any reason to put captcha on any site that a normal incremental delay between login tries & ban sources that keep doing that for too long wouldn't already prevent. They're getting traffic data and ML training data and neither one is required for the thing captcha is trying to solve. Sites are just feeding their business and captcha is actually making the internet worse place for humans.

(Captcha requirement for things like posting on a discussion could be handled by simple spam/bot detection, captcha is just overkill)


> Launch a competing product that accomplishes the same thing.

Disclaimer: I was part of hCaptcha team.

https://hcaptcha.com/ is competing with ReCaptcha. It's a drop-in replacement for ReCaptcha.

It's privacy focused (supports privacy pass), and is fair: webmasters get a cut for each captcha that is solved correctly (they can choose to directly donate it to a charity of their choice), hCaptcha get a cut for running the service and a customer will get their images/data labeled.


An alternative? This service uses both Cloudfare and googleapis. What's that all about?


Aren't there configuration options/levels that can be set within CloudFlare to mitigate these issues?

EDIT: Another user posted this below, answering my question:

sgtfrankieboy 1 hour ago | undown [-]

In CloudFlare go to "Firewall" and then click Settings on the right. Here you can set the Security Level and if you want to use Browser Integrity Checks among other thing


Yup there sure is, but the Cloudflare hate bandwagon has started. This is more on the site-owner than Cloudflare in my opinion.


4. Outsource. Use a ReCaptcha/Cloudflare filling service (usually just someone else manually typing these in).

I won’t link, but search “ReCaptcha solver” and you’ll find plenty.

It highlights just how broken the system is. It doesn’t stop determined spammers/devs, until the value of the task is lower than the cost to solve.

Considering it’s 50c USD per 1000....


Everyone keeps bringing this up but unless you have something of monetary value on the other end, this won't happen. ReCaptcha and a few if statements has stopped all contact us form spam on our site. Same for other sites I help to manage, no one is paying 50 cents per a thousand contact us spam messages.


I found all my contact form spam was being sent superhumanly fast. Well under 10 seconds from initial page load. A user can't make it to the form and type a meaningful message that fast.

Adding a short timeout eliminated my contact form spam. I also only allow JSON on the back end, so they must execute JS to even have a shot.

This has allowed me to avoid blocking TOR exit nodes... So far anyway.


I've seriously considered doing exactly this. I've solved numerous challenges for Google this week alone.


My solution is connecting via nested VPN chains, working in VMs, and compartmentalizing stuff using multiple personas.

Reputable VPN services do a good job of keeping their IPs off blocklists. Occasionally I'll get blocked, because some jerk has been abusing the VPN server that I'm exiting from. But if it doesn't resolve promptly, I just switch to a different exit server.

So I only use this VM, and this VPN exit, as Mirimir. And given that, I don't go out of my way to prevent tracking. Not enough, anyway, to trigger blocking. Because I don't really care if everything that Mirimir does gets linked. Indeed, I pretty much always use "Mirimir" as my username, or sometimes "Dimi" or whatever.

If I don't want stuff linked, I use a different persona in a different VM, using a different VPN chain. Or that via Tor using Whonix.


I'd love to use an open source/more beneficial to society version of ReCaptcha - say validating OpenStreetMap data/project Gutenberg/something else. Maybe this already exists and I don't know about it!

Would sites then move to this and reduce the lock in and inflexibility with ReCaptcha?


I think the long term solution could be in making the network providers and ISPs more responsible for the malicious traffic originating from their networks.

For example the botnet traffic is best stopped at the origin. If there was some pressure for the service providers, I'm fairly sure they could do more to detect subscriptions with compromised devices and take appropriate actions. Actions can include educating the users and if necessary, blocking the subscription until problems are fixed.

While this certainly would not immediately cover the whole world, it would be a start. On the website level you could then treat traffic from networks that have agreed to cut malicious traffic in different way.


I don't see myself as a poweruser, but it would be very hard for me to give up on my current setup. I honestly do not understand how my wife deal with it. Even with piholed wifi it is a horrifying experience..


4. Mirror the contents of this Dark Web 3.0 on a Light Web 3.0 accessed using privacy-protecting technologies like Tor to make sure it remains accessible. Obviously this won't help for logging into Twitter and your Fecebutt account but it should be fine for the nginx documentation.

The underlying incentive here is that centralized websites are slow and vulnerable to DDoS. Massively mirroring their content is the solution, and it's what Cloudflare does. Let's do it in a way that protects human rights rather than taking them away.



I’m a fan of Recaptcha v3. There are many actions where you can ask for some additional input in a non-puzzle way. Simple example is sending a confirmation email before signup when the score is below a certain threshold.

Because I freaking hate those captcha puzzles...


Except v3 has to be on every page of the site and that gives Google a full view of all traffic on the site


So does Google Analytics, so does AdSense. ReCaptcha V3 is probably the least used of the three.


Blocking GA and AdSense doesn't render websites unusable.


Google doesn't force you to do AdSense an all pages. And Analytics, well, you can also skip on sensitive pages.


4. Boycott Cloudflare.


Isn't that #1? Or do you mean more like what OP is experiencing by being banned?

GP already mentioned why #1 is not a solution (which I see the same way) and OP made it quite clear that not visiting CF sites isn't quite working either.


Because the author using a niche browser (QT wrapped in an app), this might be Browser Integrity Check. It's purpose is to block non-browser like behaviour common for spammers/malware.

That would also explain switching to Chrome fixes the issue.

https://support.cloudflare.com/hc/en-us/articles/200170086-U...


So if your browser agent is not firefox, chrome, safari internet can stop working. What a great side feature.


Yeah, because most custom browsers are malicious. They have the data to prove it. This isn't a side feature, it's a direct feature that is 100% intentional. They maintain a backend whitelist of known "good" user-agents. Curl is on that list and there are a few others outside of the big players.

Most people building custom browsers are doing it to do something Chrome would disallow. One instance would only supporting one, weak-ish cipher forcing TLS to use a predictable cipher instead of choosing the best available encryption for transit. While I agree some people have cool browser projects that would be nice to use, it's a side effect of bad actors abusing the system. Most of the annoying parts of Cloudflare exist because bad actors have abused the system.


Any sufficiently bad actor will already modify their user agent. Who is this really stopping?


Bad actors who are bad at being bad actors, which is actually the bulk of bad actors.

It's maddening, but it's true. I've seen tale of people having to modify resource auto-generators that created URLs with hexadecimal identifiers in them because the sequence "ad" in a URL would trip ad-blocking browser plugins. You might ask yourself "how many ad companies worth their salt have 'ad' in the URL path?" and the answer is "The ones who are worth their salt might not, but the ones who are terrible do, and they're probably terrible at other things too, like letting malware on their network."


I suspect that the reason that bad actors are bad at being bad actors is that the income is rather marginal and can't attract skilled devs away from more legitimate companies.


There's somebody who can build a custom browser but can't figure out how to change the user agent string?


They're called "script kiddies" and the trick is: they don't build the browser, they download a kit someone else built that has a user agent in it and use it for whatever purpose they intend to.

I went to school at a place that had a policy of soft-blocking network access for any machine that a portscan detected had TCP or UDP 12345 opened, because Back Orifice defaults to that port and people who built trojan horses to allow remote access didn't change the default. It caught a reasonable number of owned machines every year.

Don't overestimate criminals; if most were good at being criminals, they could be successful in society without having to break the law. ;)


The intersection of information security and game theory is constantly paradoxical.


Check server logs sometime. You'll be surprised how many malicious requests come from user agents that aren't regular, current browsers.


If you're willing to load up a page when you detect something suspicious, as CloudFlare does with their "browser integrity check" page, you can also try to fingerprint the automated tool. There's often something unusual about the setup like odd browser version, strange global JS symbols, etc.

Completely possible to work around of course, but it does increase the effort level quite a bit.


[flagged]


Perhaps it could, but how is that helping the conversation? I feel like all this statement does is dragging a preexisting (generally heated) topic into the conversation.


> Most people building custom browsers are doing it to do something Chrome would disallow.

Chrome is not, and isn't meant to be, DRM. There are DRM extensions for that, but Chrome (and let's extend this statement to any other whitelisted browser) does not try to limit what you can do to a website. The only restriction I can think of is the common ports thing, but if you want to connect to port 25 (typically for SMTP/email), go ahead and change the about:config setting and you can do it.


This can't only be based on user agents, otherwise it would be pretty useless. I can set my Firefox's user agent to curl if I feel like it, the same way malicious actors would just set the user agent in their scripts / headless browsers etc.


it's not exclusively UA, but in this post the author does say taking the most up-to-date Chrome UA did resolve his issue I believe.

You would be SHOCKED how many bad actors use an outdated UA or some random string they think is funny. This portion of CFs mitigation isn't meant to be hyper-advanced detection, just bounce out the low hanging fruit. They have other security services that aim to mitigate the more advanced stuff (like the WAF).


Is there a readily accessible process to get on said whitelist?

Because if not, what you're describing is a cartel colluding to keep the market controlled by oligopolies. Regardless of whether there's a good reason for them to do so.


User-agents are not make sence, because custom browsers can cosplay easily, just set the "good" user-agent. If some custom browsers have evil purpose, why it need show it off? Change user-agent is very easy.


Oddly enough, they simply don't do that. IDK why, but they don't. Also, there is a bit more to browser integrity check than just the user-agent. But, yeah. You'd be surprise how often I saw attacks get mitigated that were using some obviously bad UA. The attack themselves seemed sophisticated enough, but the UA was still a 12 version old IE UA string or "1337 browser 2000" or something dumb like that.


Why doesn’t the bad actor fork Chromium or Firefox?


Fork? Everything you need can typically be done with instrumentation. Colleagues of mine do this sort of thing (on the request of the company they're targeting). The browser is headless, but still a full browser with a common resolution and everything, and is (virtually? completely?) indistinguishable.


> This isn't a side feature, it's a direct feature that is 100% intentional.

So Cloudflare is intentionally breaking the web? Good to know.


No they are doing their job of filtering out garbage from most websites, and it's an option that the site-owner can enable.

Is this such a novel thing to look for outliers in web traffic and offer ways to mitigate risks?


Both what I said and what you said can be true simultaneously. I have been increasingly down on Cloudflare because Cloudflare is cutting me out of an increasingly large portion of the web.


I also use Qutebrowser and just set my user agent to the top result on https://techblog.willshouse.com/2012/01/03/most-common-user-..., which I update every month.


If it's looking at the actual User-Agent string, can't you just set that to a well-known string? Chrome themselves are about to freeze their User-Agent: https://groups.google.com/a/chromium.org/d/msg/blink-dev/-2J...

Since your User-Agent string probably starts with "Mozilla/5.0" anyway despite not being Mozilla, you might as well just set it to Chrome's despite not being Chrome. (Vivaldi did that for other reasons: https://vivaldi.com/blog/user-agent-changes/)


There is another side effect of visiting sites served by CloudFlare with disabled JavaScript: Email addresses (or any other words separated by an @) appear as "[email protected]" instead. Take this blog post from CloudFlare themselves for example: https://blog.cloudflare.com/serverless-performance-compariso... That's just ridiculous.


How is it ridiculous? It's called email obfuscation (it can be disabled within Cloudflare) it's to stop spam bots from ripping email addresses from websites and adding them to mailing lists.


Because their regex is crap and if you have @twitterHandle or something with a legitimate "@" you just see the obfuscated version.

It's laughably adorable to think it's actually solving a problem or helping in any way, the 'bad actors' it's trying to prevent probably have a work around anyway.


> It's laughably adorable to think it's actually solving a problem or helping in any way, the 'bad actors' it's trying to prevent probably have a work around anyway.

They do—changing your user agent is trivial.


Why would I want to send an incorrect user agent? There's no point having one if it's just going to lie anyway.


I agree. I wish browsers would just stop sending a user agent string entirely. I don't think that I've sent an honest one in over a decade.



Most of these protections are easy to work around, and people do just that, but that isn't the product cloudflare is selling.

To the target market, who have a spam issue, cloudflares protection sounds great, and by the time they've set it up, they won't switch CDN's just because it isn't effective enough.


The problem is therefore spam bots abusing the email system.


Yes, and Cloudflare provides a fairly reasonable solution to this problem. Or, at least, it seems reasonable in the eyes of someone like me who never had to work on something that attempts to solve this problem, so there might be some serious caveats, but I am not aware of those. If someone with more domain knowledge can chime in on this, that would be appreciated as well.


Caveats:

Won't protect against scrapers that execute JS (like ones based on headless browsers -- This includes some modern search engines!)

Won't protect against anyone who takes some time to read their 80-some lines of minified "obfuscated"* JavaScript and hook up a simple text transform to their crawler of choice.

So basically it'll only protect against truly trivial scrapers, but not against anyone who wants to get at it and knows basic JavaScript. You could probably get about the same effectiveness by dividing the email amongst multiple <pre> tags...

*"obfuscated" in quotes because it just means whoever wrote it threw in a trivial to bypass XOR, some number character conversions, and for good measure had the JavaScript remove it's own script tag from the DOM after executing...


What you're missing is that this is effort by the botter. Headless costs more, writing another step in the scraping process and de-obfuscating seems reasonable but again: That is effort by the botter.

If a botter really wants to it's easy to get emails scraped. But they don't care. The demographic of people having obfuscated emails on their page via Cloudflare (since you probably don't know every obfuscation solution out there you target the big ones) is also the demographic with a good spam filter (or just using Gmail).

Botters don't care about everything small. If you're bigger you do get better ones who probably specifically target you and then you have more problems then just having your email stolen.

The 99% solution from Cloudflare is complex enough to not get botted by shitty wannabe hackers.


CloudFlare's solution is to tack on more work.

A better solution would be to change the email system to hinder abuse.


Ok, so fix the email system, and then Cloudflare can remove the bandaid. Maybe fixing the email system is actually way more complex? Just a wild guess.


I don't see anyone here mentioning this, so I will. CloudFlare supports PrivacyPass, a third-party, privacy-preserving way to do proof-of-work for websites. Basically, you do some small amount of CPU work and get 30 tickets your browser can transparently redeem for access to a site.

That's a pretty clever way to both deter bad actors and ensure legitimate users get uninterrupted access to websites.


AFAIK the Privacy Pass protocol has nothing to do with proof-of-work. It's just a way to allow users to solve a CAPTCHA once and re-use that single solution across multiple sites without jeopardizing privacy. Reduces the number of CAPTCHAs you see, but doesn't eliminate them entirely.

What you're talking about is more like what Hashcash does, where it essentially replaces CAPTCHAs with a cryptocurrency miner, such that bots become more expensive to run due to the amount of energy they consume. The downside is it's not great for battery life for regular users either.


Captcha is also a kind of proof-of-work, and a very nasty one, where users are made to do the work instead of computers.


Sorry, you're right, I misremembered. You solve the CAPTCHA to get tickets, you don't do work.


I probably don't understand PrivacyPass well enough, but I don't really see how it's a good solution to the problem.


It makes it so you solve thirty times fewer CAPTCHAs, which is not nothing.


I've been noticing a larger and larger portion of sites in google results have been taken over by something. They all act the same way, and act similar to cloudfares checks. You get a big green dot on the screen first and then a bunch of redirects. I've not read anything about it, but probably 1/15 of my google result clicks end up there. How does google not know these sites have been taken over and keep them on the first page of search results? It's getting tiring because they also hijack your history, so you can't just hit "back" and you lose everything you've been searching through. You can't just click "back" to the results page and go to next result...you have to start search over and remember exact terms you used "that last search"


I have a little tip for you; right-click your back arrow, and choose your search page from the list.


At least on iOS Safari there's this bug (feels like it's always been there, maybe content blocker related?) where sometimes a search simply gets eaten. The browser somehow thinks you went from nowhere straight to the page you're on, even though you went past a Google results page.

I assume it's somehow redirect-related and that's why these sites tend to trigger it.


I’m thinking of two situations:

- you tapped the “Siri suggestion” result which completely skips the SERP. I hate that “back” doesn’t bring you back to what I typed in the search/URL bar

- I regularly visit The Verge, open a story and then the back button doesn’t take me to the homepage but to the page before it. I blame their crappy JavaScript but maybe we’re experiencing the same thing.


Nope. It's as if that happens, though going past a regular Google search results page, which is dropped (hence along with the search itself) from history.


Nope what? I described that same thing on The Verge.


Confirming this happens to me too. I feel like I first saw it on iOS 10 or so.


Another tip: block js globally(uBlock Origin) and white list sites you trust.


I really wish you could just directly pay for services. A hundredth of a cent per visit would make most abuse un-profitable while still allowing very affordable procrastination.


People always say "why, why not micropayments"? Micropayments require even smaller transaction fees. Really small transaction fees require centralized regulation and economies of scale, i.e. government control and monopoly. You can't have your freedom and your frictionless markets both. The more anarchy, the more expensive buying and selling is, and the more business models and options for wealth creation simply can't work.


> Really small transaction fees require centralized regulation and economies of scale, i.e. government control and monopoly.

No they don't.

I could sign up for the ten most popular micropayment services and the fees would be about the same as if I signed up for just one.


Are there ten popular micropayment services? Does it cost anything in time and effort to use them? Who do you trust to tell you which micropayment services are good and trustworthy?


> Are there ten popular micropayment services?

It exists just as much as your government monopoly. We're talking about a world where sites let people pay microtransactions, not the current state of the real world. I'm just pointing out that it is definitely not necessary to have government control to have really small transaction fees.

> Does it cost anything in time and effort to use them?

Whatever the government version you proposed would do, let's keep it simple and make them act exactly the same.

> Who do you trust to tell you which micropayment services are good and trustworthy?

I dunno, who tells you that visa or stripe or kofi is good and trustworthy? If creators flock to a site, they'll draw in users.


I also hope this happens. It would be interesting too, if it was bidirectional (services paying you to use them. Payment for your data, essentially).


(1) totally agree re money being an answer here, (2) not sure it solves DDOS mitigation, especially if you're giving away a free preview


This would make search engines impossible, sadly.


Why do you think that?


Contrary, I think it would make search engines better!


Cloudflare blocks an enormous number of bad actors for me, so I quite like it.


I'm curious about the specifics because through my and my colleagues' simulation of "bad actors" (ethical hacking) I've never once gotten stopped by CF.

I thought the point was anti-DDoS by just proxying your traffic through someone with bigger pipes. That they do TLS offloading to filter n-days like Heartbleed helps as well of course, but those are super rare events and it sounds like what you mean is ongoing.

What kind of bad actors do you mean, and what kind of sites? Don't have to mention the domain or anything, just that it makes a difference whether it's a web shop (financial risk I guess), more like a forum (spammer / hater risk), or something else.


Generally, the DDOS protection even at layers 3 and 4 is a blessing. For a layer 3/4 attack, all a malicious actor really needs is a bunch of average internet connections anywhere; and depending on your server hardware, it might be inexpensive to keep it down since those IPs are cheap. CF just drops all packets from these "literally 99% of traffic is malware" IPs on the network layer.

But a layer 7 DDOS attack, when going through Cloudflare, means the malicious actor needs to have IP addresses that are at least not complete trash in terms of IP reputation. Getting access to a botnet and access to these IP addresses isn't exactly prohibitively expensive, but it's a much larger barrier to entry.

It's even harder to get taken down by a layer 7 DDOS attack on Cloudflare if you use "im under attack" mode, assuming your attacker isn't paying even more for the botnet to run something like Chromium or node to hit your website.

Finally, while Cloudflare doesn't actively do this for small-scale DDOS attacks (since it might just be a spike of users), they do have Gatebot for the larger scale ones https://blog.cloudflare.com/meet-gatebot-a-bot-that-allows-u....


It wouldn't have a place if it didn't provide value.

I am a normal user though and I (thought I had) absolutely no way of viewing your content now if you use cloudflare.

Turns out using Chrome works...


Cloudflare is also making TOR quite useless.


This isn't exactly correct (at least not in all cases): https://blog.cloudflare.com/cloudflare-onion-service/

[Nitpick: it's Tor not TOR]


I don't know the specifics on this, could you elaborate?


Half the websites I visit give me the CloudFlare screen like

    One more step
    Please complete the security check to access <whatever>.
Followed by either endless stream of ReCaptchas or one completely impossible.


if captcha loops are still an issue, you should write in. JGC takes that stuff seriously. You're more likely to see a IP block, because most site owners do not whitelist TOR endpoints or specifically block them because it's mostly abuse.


I'm guessing JGC is some dude that works at CF, but where should one write in? You're literally looking at a Google CAPTCHA, you can't go to the contact page because that's behind this proxy that won't let you pass.


JGC is fairly popular on here and is CTO at Cloudflare. You can simply write into support@cloudflare.com, those tickets are forwarded to engineering teams.


Support options would be good info to put on that CAPTCHA page instead of having to find that somewhere deep down in an HN thread.

Wasn't HN also behind CloudFlare? Looks like that changed, but maybe it will be again in the future.


I do believe that there was a point where HN was using CF, but that hasn't been true for a while if memory serves.

As for the support@ not being on those error page; decent feature request. I image the reason they want to avoid that is many of these errors are delivered at request of the site owner or related to the site not working (404s, 503s, IP firewall blocks, etc) so they do not want to funnel people into Cloudflare support for issues that are not specific to Cloudflare.

Determining which errors are the site owners responsibility and which errors are Cloudflares responsibility can be quite tough.


It's not rocket science, they just don't want to solve the problem.

"many of these errors are delivered at request of the site owner" For those, put the site owner's contact method there. Even a physical mailing address, fine by me, I'll send a letter (something a spammer would not do) if it's important enough to me to do so.

"or related to the site not working (404s, 503s" those pages don't deliver a Google CAPTCHA or don't say "You have been blocked". If they can determine whether a page should have a captcha and/or that text, then that if statement can also include showing contact info.


I think the truth is somewhere in the middle here. Yes, Cloudflare could do a bit more to predict this, but I don't think it's as trivial as you make it. The routing between you to a site through Cloudflare includes a lot of complex interactions.

The captcha page, sure, maybe. I can't think off the top of my head what would happen on that page that wouldn't be related to Cloudflare/reCaptcha. I yielded that is a decent feature request. But plenty of actual interstitial pages served by Cloudflare aren't necessarily caused by Cloudflare. Like the fact you get a Captcha at all isn't Cloudflares choice most of the time, it's the site owners. And having support@cloudflare.com on that page would 100% cause people to write in saying they don't want to see captchas. That's not the appropriate party to reach out to requesting to stop seeing captchas for a specific site. Now, SOMETIMES it's an automated incident because of your IP, so then you DO want to reach out to Cloudflare.

Same with 500 series errors. Sometimes it's the website not responding, but sometimes it's Cloudflare not interacting properly.

So yeah, I think the truth of the matter is in the middle here. In terms of priorities, I have no doubt this is pretty low on their list. Why would it be any higher when they serve the technical purpose they were created for? The rest of that is QoL with minimal impact on customers compared to many other issues that go wrong with the network that have considerable impact on customers and visitors.


Doesn't the Tor browser now come with PrivacyPass by default?


PrivacyPass plus changing UserAgent to latest Firefox (instead of default TOR’s) reduces amount of these blocks significantly.


CF has nothing to do with that, website operators can choose to blacklist Tor exit nodes regardless of whether they use CF. Many of them do, because Tor is such a massive source of malicious traffic. It sucks, but it's not CF's fault.


It's the default though, no? Most owners don't go through every setting.

It has been a long time since I used CF though, so maybe there is a question in the setup phase or only a few settings.


FYI, the new/current version Qutebrowser just added a built in list of websites that require a "Chrome/chromium" UA - it also added logic so that parts of the UA can be auto updated.

Shameless shill: Qutebrowser is by far the best browser I've every used. The half measure of using addons (even powerful ones like Pentadactyl) cannot be compared to having a browser that is power user friendly in every aspect, from config to UI. If a site doesn't work well with it then I'm probably not going to use that site. If I can move away from Google then I can find your article/post somewhere else.


This mirrors my experience and opinions. However I miss certain powerful adblock features and I would prefer more/easy control over javascript that gets rendered since JS use is by far the largest consumer of energy on my laptop.


imo, they should have used the Chrome UA as a default for all sites. Leaking to sites what browser the user is using does not seem very privacy-respecting.


There is a stage where "good" companies get so big that they can't really be "good" (For all)...

Cloudfare has passed that point a while back. They have so many policies, departments, shareholders and government departments to please that it is now impossible for them to be a truly good force the open internet.


We need a website which tracks Cloudflare's captcha and blocking behaviour! I live in Cyprus (EU part!) and I witnessed more and more sites need now this stupid Captchas and I installed actually "Privacy Pass" which is also not working 100%. Why is this (EVIL!) company destroying my internet experience? This is a relatively new and recent experience for me... something has changed on Cloudflare's captcha requirements. Before that I've only witnessed this behaviour in countries like e.g. Philippines.


I suspect Cloudflare is doing some form of 'fingerprinting' to flag potential attacks. Fingerprinting is probably based on things like IP, user agent, js being enabled, etc. In this case it seems that Cloudflare only banned a specific user agent with js_enabled=no.

Obviously this is all just an educated guess, since I've worked on building scrapers for cloudflare-protected websites.


Came to say the same thing.... Try changing the user-agent the browser sends to match the user agent that Chrome sends.


Also related: depending on where in the world you reside, you may be more likely to encounter Cloudflare blocking sites or simply slowing the retrieval of websites (as it scrutinises traffic to those sites). This may be more likely to occur to you if you access the web outside of North America and Western Europe.

CloudFlare is ruining the internet for me (2016)

https://www.slashgeek.net/2016/05/17/cloudflare-is-ruining-t...

And the ensuing Hacker News discussion a few months ago:

https://news.ycombinator.com/item?id=21169798


To be fair, site owners are in control of those settings.


What are the defaults of those settings? What percentage of site owners change the defaults?

It's kind of like saying "it's your own fault you didn't de-select the "track everything I do" checkbox on our privacy page".


Which settings would those be? I'm using Cloudflare for a couple of my sites, and if I can fix this, I will. Otherwise I'll stop using Cloudflare.


In CloudFlare go to "Firewall" and then click Settings on the right.

Here you can set the Security Level and if you want to use Browser Integrity Checks among other things.


Thanks! I will look into that further.


The simplest way is just to use CloudFlare for DNS only (grey cloud button) until you're under attack.

edit: do not follow my advice


This is a horrible recommendation and will ensure that the attack can continue after activating CloudFlare. You've already exposed your origin's IPs in this circumstance.


Yikes, of course.


can't you ask your provider for a new one?


Your provider would have to nullroute the ip under attack and you'll have to wait for DNS cache expiration so your updated zone is being distributed to clients.

Short TTLs are not honoured by everyone so you'll experience some downtime.


Is there any way around that? I'd rather set it and forget it so I don't have to worry about the attack in first place.


Somewhat to my surprise, my SiteTruth system is allowed to read "bakadesuyo.com", the site with which the article author had a problem. Sitetruth uses a user-agent like "Sitetruth rating system", and makes no attempt to pretend to be a browser. Cloudflare let it through. No capcha. No error messages.

Can't conclude much from this; Sitetruth has been reading sites openly for years in a well defined way from a well known IP address, examining them for ownership info about once a month. Although it looks at millions of sites, it never hits any one site very often. From Cloudflare's perspective, that's harmless.


I also refuse to complete captchas because I'm not interested in giving a trillion dollar company my free labor to help train their models.

I'm really close to just defaulting JavaScript off entirely, the web is becoming so much worse by the day.


I've had js turned off by default for many years now, and don't feel like I'm missing out on anything important. Your mileage, of course, may vary.


You're a stronger person than me.

I keep trying to live without JS but so little of the internet works.

* Gitlab/Github (obviously)

* Google maps (obviously)

* Linkedin... (uh... less obviously)

* Outlook

* Infoq.com

* Rust docs

* Google Cloud Docs

etc;

A lot of the internet is butchered without JS.

I really want a way of just blocking third-party JS (IE; the site can deliver JS, but not anything it tries to import unless whitelisted). But that seems to be hard with qutebrowser.

FWIW uMatrix apparently has a method doing this.


Perhaps I did not express it clearly, but I do not browse WITHOUT ANY JAVASCRIPT EVER. I browse with js disabled by default, but enable it (using excellent uMatrix Firefox addon) for certain sites that I trust. Although even then I try to find the least possible amount of uMatrix permissions that enables the functionality I need from a site.

That said, I do not use many of the obvious mainstream sites - e.g. I ditched Github like dirty socks the moment Microsoft grabbed them.

But yes, modern web (not the Internet, mind you) is very damaged, and I fear it will take decades to fix the damage, once (I hope) smarter people take the reins after high-visibility security and privacy incidents become more and more frequent, and, well, more visible to general public.


I use noScript with "Allow None by Default" on Firefox while I'm at work and should in theory only be browsing docs anyway. Trying to blanket ban third-party JS wouldn't really work b/c many sites have their legitimate JS on various CDNs. After enabling individual scripts on maybe a dozen sites I'm in good shape when it comes to web browsing day to day.


They’re also using your labor to train models by serving you pages and checking which links you click. No js required.


Just because he's being exploited by Google in one way doesn't mean he has to allow himself to be exploited in every possible way.


If you care about being exploited, it’s worthwhile to know when it’s happening, so you can consider your return on effort for avoiding it.


Do you consider "exploitation" here to be negative or unethical?

He is willingly using Google because the provide amazingly useful services completely free of monetary charge. Are you objecting to the fact that Google benefits in some way by providing this service?


CF appears to be looking at the UA-agent and deciding to block based on that.

I'm guessing it's very basic checking because deep browser-fingerprinting is supposed to against the law in some countries (I stand to correction on this statement).

I'm not personally a fan of CF because of the amount of data they can potentially obtain(or do), but there's a lot of crap out there and their firewall is robust enough to protect Johns Cowboy store from contributing to some dudes Monero mining botnet.


I think the site went down. Here's the archive: http://archive.is/dVfYE


Ironically; Cloudflare would be quite useful for me right now. :)


ReCaptcha is a necessary evil, but there's something that needs to change: the internet-wide rate limit. If I try to scrape a site and start getting captchas, when I go back and try to browse a completely different site, I'm suddenly trying to drag the points to outline a plane for ten minutes. Keep track of reputation on a site-specific basis.


Random web hosts also have sloppy antibot 'AI's'. Siteground's is the most annoying because it for some reason thinks my work's network is a bot, and it doesn't identify itself unless you reverse image search their little anti-bot picture.


So basically the point of the article is this: there are no perfect tools. Not sure why this article got so much traction.

It's like complaining that airport security checks your bags when you have gun-shaped objects inside it.


More like asking why you are being kicked out of the airport when you have a bob hairstyle.


Google's Recaptcha is a much bigger problem then Cloudflare's service for me currently, but it could change at any time.


This is the death of alternative browsers it's only a matter of time before Firefox and friends gets blocked from major websites for "security" reasons and you will only be able to browse the internet in a browser that enforces full tracking and ads only. The MPAA and similar organizations is probably working on this already.


Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: