This wasn't true in 2016 and isn't true in 2019 either.
Whilst the system was named and blogged about in Sep 2017 https://blog.cloudflare.com/meet-gatebot-a-bot-that-allows-u... the internal repo has the first working code drop in Feb 2015.
The (D)DoS protection systems and team, which I am the Engineering Manager for, treat all attacks equally according to the nature of the attack rather than the target of the attack.
To address other parts of the blog though, there is nothing in our system that defaults to "do X for IP/Country y"... what there is, is learned state about traffic, clients, and the many layers of customer expressed configuration.
It’s not that Cloudflare is ruining the internet for you. It’s the loads of illegitimate traffic and attacks that just so happen to originate in your country.
"They just so happen to look like gorillas!"
This is not okay.
I know it's just a pile of linear algebra and there's no conspiracy to screw a particular country.
But once it's brought up that your system is causing harm by increasing inequality in the world, you can't just say "But that's what the algo is spitting out!".
At least try to fix it.
How do you fix that which is not broken?
I understand the problem you are trying to point out, but the problem is not in the technology, it’s in the application.
If the country is the source of an outsized number of bad actors then – probabilistically and at scale – "you" are responsible, yes. I'm not trying to be blame-y here in that I mean the probabilistic "you" and not you personally.
Whether you personally deserve the consequences of that responsibility is more ideological than pragmatic - you probably do not deserve it, but how do others find out which you are? That brings up questions of system design, and avenues available for remediation.
If your small country creates outsized issues for the internet as a whole it might not just be defensive measures that are to blame for insufficiently punishing or defeating attackers. In that case, economics and politics might factor in and the question becomes different.
It's small enough that it probably couldn't even bring down a single CloudFlare-protected website IMO. And when we've seen stats here on HN about recent attacks on GitHub, the total bandwidth was what, 1.35Tbps (~169GB/s)? (Link: https://www.zdnet.com/article/github-was-hit-with-the-larges...)
I've met sysadmins here in my small country -- guys and girls who administer several thousand gigabit-fiber-to-the-home 1Gbps connections -- who said that for a ~$15,000 tech investment they can build small infrastructure that can withstand 10Tbps DDoS attack, if their upstream ISPs don't cripple bandwidth under pressure.
Not saying they might not have overestimated themselves but it's a data point regardless.
I don't know for sure either way because I am not a professional sysadmin or a hardware guy. But I feel a lot of hosting providers are just chickening out. Perhaps it's the ingress/egress charges? But 169GB/s definitely doesn't sound that scary. My $140 router can handle sustained 110MB/s for hours. I can't imagine there are no clusters of enterprise-grade tech that cannot sustain 200GB/s constantly?
I see a throwaway account but let us know, what country are you from? Seeing how abusive US behavior is on the internet and in other domains, for me as an American, I'd be afraid to practice what you preach.
It's not classifying personal behavior, it's putting an additional barrier based on the country you're from. This is done in pretty much every other form of international interaction as well (visas, extradition, commerce, etc.).
Restating the current status quo isn't changing anything.
You could argue that this is a great improvement over the status quo, which just blocked entire countries out of hand.
Help me here.
You aren't being specific. You only restated that you leave a lot of visitors at the mercy of statistical algorithms.
If it is not IP/country, then what would cause this user to be prompted for captcha repeatedly?
I imagine the solution should come about as follows.
Customers complain about captchas and seek alternative isps.
Someone else provides better connections without tainted ip addresses (without sketchy customers)
OR, the current isp decides to crack down on customers who are being labeled as abusive. Problem solved.
Someone is likely providing cheap internet to sketchy people, who taint the experience for other users. It is on them to crack down and stop selling to sketchy users. Ban their payment methods/identities, so cloudflare doesn't have to ban ips.
The good news is Safari and Firefox are making this harder and harder. The bad news is that you’ll be solving a lot more captchas.
I'm convinced that reCAPTCHA is now just one of Google's tools to keep us all trackable and stop us from protecting our privacy. And CloudFlare is bringing it to half of the Internet. Bad bad bad. And shame.
This is my pet peeve, too. I browse over a VPN all the time and these things pop up all the time and I hate them. I wouldn't have a problem with this mechanism if it worked, but it doesn't! It will randomly fail in various silly ways: new challenges are faded in so slowly there's no way you can wait for it, other times it will just keep on asking you to solve new challenges even though you're solving them correctly (I counted them once and it asked me to solve one 11 times before I gave up). And sometimes it will simply fail to load properly, or it will break websites in various ways.
Lately, I've taken to solving the challenges badly in subtle ways, just because I can. If I'm gonna train Google's neural networks for free, I might as well train them badly...
Can you elaborate on this technique? Do you just purposefully solve them incorrectly a number of times before solving them correctly or something else?
At this point, to be honest, I'm really not sure about the effectiveness of this mechanism anymore. It seems to me like a poorly-trained neural net with convincing human-like interaction (e.g. clicks spaced unequally) could produce exactly slightly-incorrect results that let me through. There's no way a smart 17 year-old wizzkid hasn't figured it out already and getting rich selling it to spammers in onion space.
This is a lot and it makes you realise how dangerous this situation is. Way too much traffic goes through cloudflare.
>Privacy Pass is a Chrome/Firefox browser extension to make browsing Cloudflare-protected websites a better experience for users. In particular, if a user IP address is designated to have a poor reputation then the user may have to solve a Cloudflare CAPTCHA page before they can gain access to such websites. Privacy Pass uses elliptic curve cryptography to generate 'anonymous' tokens after a single CAPTCHA page is solved. These tokens can be used in future engagements with Cloudflare websites to prevent having to solve more CAPTCHAs. The extension generates 30 tokens for each CAPTCHA solution and thus can be used to reduce CAPTCHA pages for each user by a similar factor.
EDIT: people seem to be confused as to what bug cloudflare shipped. The bug is not having people solve captchas because their IP has a bad reputation. It's having them solve it over and over again.
You can put it however you want it, but if my app's UX is fine without cloudflare and it's shit with cloudflare, for a small but significant percentage of my users, then CF has a bug.
You seem to be confused about what your rights are around website availability. Hint: you have no rights. Absent specific coercion by government, the owner of the website had all the rights. If she wants to require you to solve a Where’s Waldo first, that’s her prerogative. Your choice is to accept the terms or go elsewhere.
You know why? Because the ad-revenue is worthless (and often malicious) and the users will be more trouble than they are worth. Same thing is happening with net traffic from other low value regions. One star reviews because users from $banned_region are complaining about lag due to their crappy wifi and/or some other issue you have no control over (defective ram in their 6 year old 2nd hand phone comes to mind)? Sign me up!
In these developing countries, great swathes of users are accessing the internet behind carrier-grade NAT.
This makes it increasingly likely that any individual user is sharing a public-facing IP with one or more bad actors.
In my experience, I’ve never had to solve more than one CAPTCHA per domain, and frankly clicking a checkbox isn’t that hard.
As far as discrimination goes, this is a much friendlier solution than just immediately rejecting connection requests from certain CIDRs, which is what would otherwise be happening.
If it were that easy, there would be little complaint; the complaints seem to be that people get stuck on capchas indefinitely.
Do you have any citations that CGN is any more prevalent in developing counties than in say Western Europe or the US? The last report from RIPE that I read indicates CGN usage in substantial in both the RIPE and APNIC regions. How would IPv4 resource exhaustion be an economic issue?
>"In my experience, I’ve never had to solve more than one CAPTCHA per domain, and frankly clicking a checkbox isn’t that hard"
I imagine if you are personally "responsible for maintaining a relatively aggressive set of Cloudflare WAF rules" as you stated, you've probably become quite proficient at solving CAPTCHAs. I think people that don't mind jumping through hoops are a minority. Also just even if something isn't hard does not mean its any less annoying and degrading of the user experience. Those things are not mutually exclusive.
I tell Cloudflare to block all traffic from China because my services derive zero contribution and zero potential value from the Chinese market. The maximum potential positive contribution from China is near zero. The overwhelmingly likely contribution from China is attacks from within the country.
So, to summarize, in my particular case China provides nearly zero positive value and China is simultaneously one of the biggest attack origin countries. It would be the wrong decision to not aggressively discriminate against their traffic: I lose, in real terms, absolutely nothing from blocking all Chinese traffic.
Maybe it would help you to travel the world more, but once I did I had a different view of things. The internet is truly a global entity, and the more we can do to keep the Internet unified the closer we can bring the planet together. To me that’s a much more important goal than short term profits or mitigating trivial attacks with poorly thought out geo-restrictions.
At this point I basically refuse to use things with recaptcha or the stupid little cloud flare dots. I just close the tab and move on. I am just so tired of little dots, storefronts, cars, etc...
Given the fact that they've resorted to providing an extension to do this, this suggests that they've deliberately engineered their services to not have access to that data internally, and that's a good thing.
An extension is easy to install and is a reasonable way for the CDN to verify that you're not a spammer without requiring you to repeatedly prove it whenever your IP changes.
But they require it for static-like content that any decent site should be serving from cache.
That sounds like a powerful use of personal choice to me -- allowed by an internet that (still) allows individuals to make choices in their own best interests.
Notice that you never see Akamai presenting these messages that you've been blocked.
Most of these pages where you get blocked are something that looks entirely static, should be cachable with the most basic nginx if dynamically generated, yet Cloudflare tells everyone that they need to protect such content from the users. (Some of their newer competitors that protect from more "bots" are even worse, BTW.)
Edit: Even the CIDR block size isn’t a good indicator of the actual network size, due to NAT.
It's like that IBM saying: no-one's been fired for buying IBM. Doesn't make it a good choice, though.
The worst I've seen is the SPIN website which always requires a captcha.
Cloudflare is just one more middleman extending their tentacles over the web.
WTH is this newspeak, just say website owners.
CF is a free service. Websites can choose to use it or not, and it certainly does not dictate the nature of the internet.
Yeah, right. I think double quotes is more appropriate here.
Two things that have changed since then are 1) Cloudflare’s Privacy Pass browser extension and 2) their significant network expansion, both of which would be likely to affect the experience described.
One thing that has not changed, however, is how many (typically unsophisticated) web site operators actively search their logs for signs of suspicious / bot activity and then institute manual blocks in the hope of catching all of them. This is often done with very blunt instruments, such as whole-country blocks.
In contrast, people who are confident about their infrastructure can deal with the background noise of the Internet appropriately—by doing nothing.
When CF fails to protect against the small 30 request per second flood once, they're probably going to go through and add a bunch of aggressive blocks like blocking entire countries, as you've said.
I'm aware on the Chrome extension (really, why should I install this sh&+t on the first place?) and that you can change Cloudflare settings. But the usual IT admin won't change this settings and f&+ck up Asia.
Since 100% of the traffic (based on our analysis) coming from Asia is not legitimate business traffic, how would you advise those responsible for these sites security to handle this?
Edit: I have no interest in using Cloudflare...
The sheer volume of bot traffic surprised me at first, especially since my website has zero human visitors as far as I can tell, but the numbers are consistent month after month.
Nevertheless, my $1/month VPS can handle the traffic without a problem, so I see no need to ban or rate limit any IPs, especially since I hate captchas with a passion.
I run a company that routinely scrapes government run, public domain websites. Sadly, many of these sites come with captchas. We can easily bypass these captchas by paying roughly $1.50/1000 captchas, but when scraping millions of pages a month, these costs become significant.
As far as I can tell, adding a captcha to a site does nothing to prevent bots, it just alters the economics of any business that relies on the data. I understand that bots can potentially slow down servers and cause disruptions for human users, but for the handful of government agencies that actually talk to us, we happily restrict scraping to certain hours of the day or limit overall traffic to a reasonable level. I would go further and happily give the money we're spending on solving captchas back to the government so they can upgrade their servers and make the system better for everyone.
For those that are conducting nefarious activities, captchas likely do nothing. For individuals, they are annoying. For legitimate scraping companies, they are a needless expense. Captchas are pretty obsolete.
Definitely agreed. Recently I have been working on a side-project that makes use of bypassing/placating reCaptcha and it has been trivial and not so costly.
If it is accounts you're creating, it simply puts a "reasonable" price on account creation. If it is about scraped content, once again, does the same. However these costs already existed in terms of compute resources and time anyway. Captchas hardly made it any harder.
However, keep in mind that low price often comes at a cost. For example, cloudatcost had a cheap VPS with a onetime payment for life (yeah, I know, too good to be true), but then retroactively invented a maintenance fee. Also several days of downtime were not uncommon.
Before the complete lockout it was taking around 20 minutes of busses, bikes, sidewalks, and stoplights each login. Nasty feeling when you come back and you realize you have to log back in for some reason.
AlgoVPN works great regardless of provider. It's more secure and private than public VPNs (not to mention fast).
This comparatively to stuff like ExpressVPN where I had to go through the aforementioned hurdles.
Personally I'd reccomend AWS or GCP given the free credits anyways.
At least I'm lucky enough that my VPN is for some reason still not detected as a cloud hosting provider by either cloudflare or netflix.
I do see that with traffic from China (but then blocking ASN from Aliyun and other cloud provider is enough to stop most of it) but I don't really see that from HK. I would say that blocking all traffic from a region instead of just blocking ASNs from cloud hosts is like using a hammer to kill a fly. It might work but you get collateral damage.
And regardless when I do block ASNs, I block uncached resources. What's the point of blocking static pages?
Most network attacks and Spams actually comes from the United States.
So it's not a false narrative. Most attacks and spams do come from China, not the US.
If you have good security practices, you don’t have to worry about the script kiddies. If you have bad security practices, blocking Asia won’t help you.
This actually isn't true when browsing over Tor. It always makes you do 5 or more rounds of "click the traffic lights", "click the bicycles", etc. I don't know why. It seems to go far beyond checking that you're not a machine, and I suspect they're just abusing Tor users to get free machine-learning training.
I route all my traffic via mmy own vpn server at Hetzner for privacy and security reasons and this Cloudflare bullshit is infuriating at times.
Besides I guess 95% sites that use their free tier either don't actually need it or would be better off without it.
Your IP is coming from the cloud just like all the actual bot traffic.
Basically: you cannot hide the information, you cannot make users jump through hoops (captchas, require signup/login, pay for accessing) to read them.
Presumably the fact that it's not the site owner mandating the captcha, but an intermediary service provider doesn't matter then?
This is another related issue, too, as CF is a data processor, so the controller (=site owner) needs to make users aware that their data is being shared with CloudFlare, as SSL terminates at CF, the content is analyzed and it's then (optionally re-encrypted) transmitted to the origin.
They are not my hosting provider anymore.
How did I find out?
Traffic to the sites (all legitimate) fell circa 75% overnight. (non english sites)
How do you know it was all legitimate?
I don't understand how cloudflare can stay in business if they cause a 75% drop in legitimate traffic to any website.
VC backing and recent IPO have them swimming in cash, but they currently aren't profitable.
We have to remember that those are real people and not just percentages of traffic that are affected when we make decisions like putting a captcha up against every visitor from certain countries. I like to think we were already doing that, but it is good to be reminded.
However, I am not quite sure I am getting what the author is suggesting when they say that sites should forgo a CDN. Maybe I am biased, but if you thought latency was bad when a datacenter near you went down for maintenance, try having to go all the way to the sites origin in New Jersey for every request. I am not aware of any way besides a CDN (or a CDN like setup) that would get you good performance for people in all countries.
So I get the frustration with the captchas, and I get the frustration with the lack of multiple datacenters near you, but I wonder if you will make things worse for yourself by advocating to not have a CDN.
A) serve your site quickly --- there's going to be a lot of not as avoidable latency between you and the user, but anything after your server gets the request is on you.
B) Keep your page weight small. Of course, this is a good idea anyway, but transfer rates on high latency connections are more often limited by tcp slow start than bandwidth.
C) Use TLS 1.3 or at least TLS 1.2 with ALPN (which triggers TLS false start in at least Chrome, and I think other browsers), as these reduce the number of round trips during connection setup compared to standard TLS 1.2 or earlier. It's worth measuring http/1.1 vs http/2 vs http/3 to see what works best for a particular site on high bandwidth networks.
Honorary mention: make sure path MTU blackhole discovery is working. There are still plenty of networks that have path MTU blackholes, and sending packets the size clients said they could get doesn't always work.
Cloudflare's been working on improvements to the CAPTCHA system for Tor users (https://www.zdnet.com/article/cloudflare-ends-captcha-challe...), maybe some of those have benefited foreign countries as well. And CloudFlare does do country-level blacklists (which show up as the CAPTCHA behavior described) so maybe StackExchange had/has an overaggressive firewall.
I really don't think going back to not having Cloudflare-like services is a step forward. Is there another way?
I'd rather they track me, by IP address and/or cookies, and stop this non-sense. I accept the cookies and have a static and dedicated IP, but there's never an end to Cloudflare's captchas.
I have internet at home through Comcast, a huge and universally hated ISP in the USA. I also run a home server (though it only takes incoming connections, and does not visit websites itself). I purchased a dynamic DNS service that would update my domain's DNS whenever my home internet's public IP changed.
In over 5 years and through numerous modem reboots, my IP address has not changed once. A year ago I transferred my domain to another provider; I did not bother setting up dynamic DNS again and my website still works fine.
I have not purchased a static IP from Comcast. When I initially set up the server I had read that my home's IP address can change anytime the modem reboots, or possibly anytime at all, to any IP in Comcast's pool - which is why I subscribed to dynamic DNS.
So a static IP may not be as unusual of a setup as you say it is.
(Edit: I mean that the static IP was almost always included by default in the standard setup, I didn’t have to request or purchase one)
CF customers probably know that any delays cost viewers/visitors, but losing a few good visitors is worth preventing a ton of bad visitors. And CF generally seems very thoughtful about their actions. If they make something unpleasant, they're likely to have a good reason for doing so.
Why exactly do they need captcha protection?
The default is medium, and the sites where you run into checks all the time are probably on high.
So far, nothing of value was lost.
PS. To be fair it’s not just coming from Cloudflare-protected sites. Webmasters and SaaS-app devs add this to their WAF layers everywhere. :(
Most people live in homes, not datacenters, which means that web sites expect human traffic to come from residential IPs, not datacenter IPs. What comes from the datacenter IPs is an endless stream of costly abuse, so they get CAPTCHA'd (if they want to support use cases like yours) or blocked (if they don't care about those)
But I had TONS of New Zealand websites captcha me without a good reason. They all used some shitty local providers.
In a few years, when I have a team of engineers and can spare the resources/expertise, we'll come off CF and do it properly. Until then, CF is a great service.
Sorry that your experience from SEA is not great, but tbh we're not selling in SEA, so any traffic from there is just a resource drain on our servers. Anything to discourage traffic from areas we're not serving is a positive for us.
Anything to discourage bot traffic is a huge positive. CF won't stop the bots, but costing them some minor amount of money per visit is still positive.
It's barely anything at all, but my personal website is behind cloudflare and I've never had any trouble.
I’ve always wondered how all the expats in Bali accomplish anything. Everybody says they can work remotely via the internet and run their businesses, but when I was there the internet would go down for the entire island frequently and could be down for minutes or hours.
CF has the ability to read/alter the information we are sending to (or receiving from) the actual website. CF also has the ability (I do not mean they do) to impersonate the original website without the owner’s knowledge, and with visitor’s trust.
So, websites are protecting themselves as best they can while pissing off the fewest of their customers by putting up more security measures blocked by country IP.
The real issue is the authoritarian governments that are making it so countries (like China) are completely fire-walled off from the internet.
Have you heard that Iraq is having massive street protests at the moment? If not, it might be because the government cut off internet access so a lot of the stories and media about the protests are not making it onto the social media sites. That's scary.
There are two complaints,
(1) cloudflare requires a captcha for visitors from some regions (like SouthEast Asia)
(2) cloudflare does not have enough nodes in SouthEast Asia, and OP feels being rerouted to another node defeats the purpose of a CDN.
Yet, Cloudflare does (1) because they often see attacks from those regions. I'm not sure blaming Cloudflare for this is the right strategy. Regarding (2), CDNs do not just benefit users, they benefit the website too. Getting rid of the CDN is not a solution. Is there a better free CDN for that region? Is multi-CDN easy to setup?
Why not? The whole everyone-needs-Cloudflare is a made up problem, which depends on many false narratives.
And why do we as website visitors have never heard of Akamai, yet it's hard to find anyone who's never seen these captchas from Cloudflare and Incapsula?
> Is there a better free CDN for that region?
You can only find free mice in mousetraps.
I've never seen anyone say "everyone needs cloudflare" except maybe CF itself.
> why do we as website visitors have never heard of Akamai
Akamai is not free, they have a trial period that is free. It's not in the same space.
> You can only find free mice in mousetraps.
I don't understand your point. You feel CF is a trap?
Of course it is. If you aren't paying for service, you're not the customer, you're the product.
The whole mandatory TLS campaign is part of the lock-in, too.
CloudFlare protects websites dedicated to doxing, stalking and harassing their victims. And when their victims complain, CloudFlare forwards all their personal information directly to said website owners while doing nothing about it. Websites that host content like the Christchurch massacre video and manifesto. Websites that have bullied people to the point of suicide.
I'm all for the importance of freedom of speech and being able to say offensive things on the internet, but CloudFlare is protecting sites that flagrantly violate the law.
While 8chan being kicked off CF did bring them offline (their new anti-DDOS provider was told to kick 8chan off by the upstream bandwidth provider, and the domain hasn't worked since to my knowledge) the daily stormer is still working. CF kicking TDS off their network didn't stop their website from working.
But when a site clearly crosses into blatant criminality, it's disappointing when everyone who has the power to rein them in (Google with PageRank, CloudFlare with protection, the US government with criminal proceedings) decides to pass the buck on to someone else. It's always someone else's problem, meanwhile people's lives are being ruined and lost.
These sites would not be as highly profitable without CloudFlare's network, so I think it's fair that if someone wants to use CloudFlare, they're aware of what they're supporting when they give CloudFlare their money.
IMHO, them failing to be the prosecutor, judge, jury and executioner for deplatforming inconvenient content aren't quite one of them.