If I were to write a bot, copying current browsers' user-agents is literally the first thing I'd do
Sort of like having to take of your shoes when you board a plane. If that’s what it takes, isn’t it just better to stay home?
Removal of shoes, 'naked' full body scanners, these are all terrible, and I tell myself every time it isn't worth the hassle.
The reality is that as much as I hate it, I'm still flying every other week.
I'm also on the Internet daily. I don't see that changing.
I hate the direction the internet and tech is going, and I hate even more that I'm seemingly powerless to do anything about it
I hate it.
The web sucks. Society/civilization is shaking in its foundations.
I just wish the passive non-violent approach would work. It worked for Gandhi, but in this day?
I feel we're all getting overrun by technology. Unfortunately, as it could have been the opposite.
Tested with Cloudflare and many, many other servers over many years.
On the whole, taking the entire web into account, it is rare for a user-agent string to be required.
However, it has become common for servers to make many assumptions based on user-agent strings.
I would guess there are many tech workers whose entire job rests on the assumption that user-agent strings are always present, rarely manipulated^1 and accurately represent the user's hardware and software.
1. For example, changed using "Developer Tools" in the major browsers. Google's browser has some user-agent presets for "testing" in DevTools (Ctrl-Shift I, Ctrl-Shift P, Drawer Show Network Conditions). Those should be safe to use for logins to Google websites. Try them out, e.g., when logging into Gmail and watch how the user can request vastly different web page styles based only on user-agent string.
for context this is what I had set (and, for quite some time it was working): "Mozilla/5.0 (X11; Linux x86_64; rv:57.0) Gecho/20100101 Firefox/57.0"
Ironically I set this so that I could continue logging in to google. Since I had been unable to log in to google-apps without setting this user agent string.
What did it fail on? the mis-spelling of "Gecho"?
As others have echoed, this is probably a huge marker for malicious bots to Cloudflare.
It is sometimes expensive for people to upgrade browsers, called evergreen by developers so they can avoid annoying support expenses for a few percent of people.
I had a phone running a Mozilla browser, which received updates until it didn't any more.
Then the only way to upgrade browser was to purchase a new smartphone.
Unfortunately it was a superb device with no newer replacement, so to upgrade browser I had to downgrade my smartphone for other uses, and pay the cost of an expensive new smartphone despite not really wanting one. But sites saw it as "you are running an old Firefox, you obviously can trivially upgrade".
I still have a perfectly great old Android tablet running an old version of Chrome which cannot be updated. Other than website compatibility, everything on it that it is used for is still working flawlessly. Perfect screen, sound, wifi, memory, battery.
For now, enough sites work on it that I still use it. That can be replaced easily with another tablet, but it is disappointing to have to spend cash and throw away a working product to e-waste, just to replace it with a functionally identical device because of the way the software treadmill works. (It doesn't have to work like that, it's a choice made by developers collectively.)
I mean, they said they gave long notice for the change, but I didn't think that a browser that "empowered users" and "gave them control of their machines" would ever do that. I mean, if every change has to be approved by Mozilla, why not just shrink wrap the browser and make me get it from Microsoft at Best Buy?
Long term support (ESR) Firefox releases are supported for about 15 months from release. And even that means using a major version that old, not a point version that old. Firefox 57 wasn't even an ESR, so it went out of support a couple of months after release.
For the Google issue, qutebrowser v1.9.0 does that already, see https://github.com/qutebrowser/qutebrowser/issues/5182
Having a Chrome UA is a MUST on webkit based browsers if you want Google's taxing services such as Earth/Maps/Gmail and so on being faster and smoother than ever. Seriously.
Once you open Street View on luakit/vimb with a Chrome UA, the diff is night and day.
No clue about the issues with Google, perhaps some feature detection going on?
> It’s the same thing, recognizing that the MITM is neither male, nor human at all.
I don't see why this is important for a technical term. People hear the term as a slug, a group of words, not as discrete ones. No one actually pictures a man or anything else in the middle upon hearing the term. The difference is that the purpose of language is to communicate with others, and everyone understands man in the middle. I look up the "alternative" and get more results for "Henry the Hugglemonster" than I do for network traffic interception.
Thanks, I’ve always wanted someone to mansplain to me how I hear terms and what I picture while I hear them.
They're silently embedded in a huge portion of modern websites, and the average user will never even know about them.
But it seems to be way too easy for them to blanket-ban or serve an absurd amount of captchas to powerusers, linux gurus, privacy geeks, or anyone with the wrong combination of browser+addons. And the failures (as in this case) are often silent, cryptic, un-fixable from the user end, and can prevent us from accessing massive swaths of the internet. Any thoughts surrounding this conundrum?
1. Everyone stops using ReCaptcha/Cloudflare.
- Never going to happen. They dominate the market because they are useful, well-made services.
2. Launch a competing product that accomplishes the same thing.
- Good luck competing with these giants. Also, how would your implementation differ to solve this issue?
3. Powerusers and tech nerds must conform to 'normal' browser configurations and disable privacy addons in order to enjoy the internet with 'normal' users.
- Two steps backwards in every conceivable way. The giants gain more invisible power and powerusers suffer decreased productivity/privacy. Not going to happen.
We're in a "this is why we can't have nice things" predicament and you have malicious actors to thank for that, yet most people on HN only seem capable of attacking the few affordable solutions to that problem.
I'm even down with the theory that Cloudflare is a US government outfit, that's the only way I can wrap my head around such a generous free tier. But at what point does it worry you that the internet has so many fundamental issues that people willingly centralize behind such a large behemoth? How many options do I have when a kid is holding my forum hostage with a $5 booter service?
It's easy to shit on everything. Let's hear some real solutions.
It's by no means a full solution (there likely is no single full solution), and it may even be a bad solution -- but lately I've been trying to think about what the Internet would look like if we didn't have a massive arbitrage potential around server requests.
Part of the reason why everyone is trying to detect bots is because bots will very, very rapidly eat up your bandwidth and CPU time. We're used to offering our bandwidth/CPU for free to humans and either swallowing the cost if we're running a free service, or making up the cost in an adjacent way (ads, subscriptions, etc...). It's not bots that are the problem. It's that when someone asks our servers to do something, we do it for free. Bots are just a big category we can ban to make that problem smaller.
In many (but not all) cases, we shouldn't care about bots, and the only reason we do is because our systems aren't scalable to that level.
So I've been wondering lately what a server-defined per-pageload, or even per-request fee would look like on the Internet, maybe one that scaled as traffic got heavier or lighter and that was backed by a payment system that wasn't a complete rubbish privacy-disrespecting dumpster fire.
My immediate thought is, "well, everything would be expensive and inaccessible." But, the costs don't change. You still have to pay server costs today. Businesses today still need to make that money somehow. There are almost certainly downsides (all our current payment systems are horrible), but I wonder if it's more or less efficient overall to just be upfront about costs.
Imagine if I could put up a blog on a cloud service anywhere with scalable infrastructure. Then a post goes temporarily viral. Imagine if my server could detect it was under heavy load, detect that it was getting hit by bad actors, automatically increase the prices of requests by a fraction of a cent to compensate, and then automatically ask my provider to scale up my resources without costing me any extra money?
For a static site, suddenly I don't need to care if people or bots are hammering it, I don't need to care about anything except whether each visitor/bot is paying for the tiny amount of hosting costs they're hoisting on me. If bad actors start pushing traffic my way, I don't need to ban them. I just force them to pay for themselves.
Thought bot detection was only done during registration etc. to stop them from sending spam etc. to real users.
Except then there are those pesky CGNATs to handle including Chinese Great Wall.
Anyway, high profile spammers will emulate enough of the browser to render any measure based on browser anomaly detection worthless. Including using a headless browser.
The only way to defeat them would be too put some quite computationally intensive JS operation... (On par with mining, ruining all the laptops, phones and tablets. But you can make it not trigger every time.)
This would make spamming expensive.
Server-side we have excellent AI spam filters that nobody seems to be using to fire off a captcha check later. The big problem here is that you cannot offload to some provider without inviting big privacy concerns. (Same problem as forum/chat/discussion platform providers.)
Do you think a botnet with 10k machines is going to be meaningfully inhibited by making each machine's cpu run calculations for a second or two for each submission?
I'm sure reCAPTCHA looks at the IP and IP block as one of the inputs to its ML algorithm, but as one or two of perhaps a dozen different features - including mouse movement and/or keyboard input, which is quite a bit harder to fake.
Based on actual experience of fighting spammers, that isn't the case. Like a lot of people new to spam fighting you're making assumptions about the adversaries that aren't valid.
Some will be stopped by the simplest protection mechanisms.
Some will be indistinguishable from real humans, and you won’t be able to stop them without crippling your services for your real users.
But those are the two extremes. The real problem is the ones between those extremes.
Every intentional stumbling block you put in the path to try and stop those in the middle might also have a negative impact on your real users. The real problem is that the most troublesome attackers will learn and adapt to whatever stumbling blocks you put in the path. So, how many of your own toes are you willing to sacrifice with your foot guns in the name of stopping the attackers?
This isn't easy and many firms fail at it, but you it can be done and we routinely did it.
Seems highly unrealistic.
CPU, bandwidth, electricity, it's all just energy. And to a significant degree, money is just energy stored. I generate energy with my own work, store it in the form of money, and then transfer that energy to someone else, maybe to heat my home or cook me a meal.
Before money, I had to barter for those things. Maybe conceptually the internet is in a similar state at the moment. It doesn't have 'money'. Why can't I put CPUs in my wallet and then spend them? And why can't I charge visitors to my site by the CPUs they are costing me?
Instead, I have to, in a way, barter. For example, maybe I use ad revenue to earn my income, so I generate all this content, I barter that to the search engines, which barter with the advertisers, which barter with me, and I barter back to security guards to protect me from 'bad' actor bots. I'd really just like to receive CPU and bandwidth payments from them.
Not at all. Barter was quite uncommon also unpractical. Most societies used (and use) social connections and trust.
This sort of solution is frequently proposed but doesn't work, because:
• Serving costs are rarely the problem. Normally it's annoying actions taken by spammers and the bad reaction of valuable users that matters, not the machine cost of serving them.
There are occasional exceptions. Web search engines ban bots because left unchecked they can consume vast CPU resources but never click ads. However, they only get so much bot traffic because of SEO scraping. Most sites don't have an equivalent problem.
• There is no payment system that can do what you want. All attempts at creating one have failed for various hard reasons.
• You would lose all your users. From a user's perspective I want to access free content. I don't want to make micropayments for it, I especially don't want surge pricing that appears unrelated to content. Sites that use more typical spam fighting techniques to fend off DDoS attacks or useless bot traffic can vend their content to human users for free, well enough that only Linux users doing weird stuff get excluded (hint: this is a tiny sliver of traffic, not even a percentage of traffic but more like an occasional nuisance).
• You would kill off search engine competition. Because you benefit from crawlers, you'd zero rate "good" web bots using some whitelist. Now to make a new search engine I have to pay vast sums in bot fees whilst my rich competitors pay nothing. This makes an already difficult task financially insurmountable.
Doesn't medium do this?
That's not asking people to pay for bandwidth/compute power, it's selling something adjacent to your content that you hope makes up for the loss.
> People who serve ads don't want to pay for bots which is why they are a problem.
That's kind of my point. When you ignore the arbitrage potential of serving requests for free, it forces you to care about making sure that your content is only available to the "right" users. You have to care about things like scraping/bots, because you're not directly covering your server costs, you're swallowing your server costs and just hoping that ads make up the difference.
Theoretically, in a world where server costs were directly transferred to the people accumulating those costs, you wouldn't need to care about bots. In fact, in that world, you shouldn't care whether or not I'm using an automated browser, since digital resources aren't limited by physical constraints.
In most cases, the only practical limit to how many people can visit a website is the hardware/cost associated with running it. A website isn't like an iPhone where we can run out of physical units to sell. So if they're paying for the resources they use, who cares if bots make a substantial portion of your traffic?
> Doesn't medium do this?
No, Medium just sells subscriptions, you don't pay for server usage. As far as I know, no one does this -- probably in part because of problems I haven't thought of, also probably in part because there are no good micro-payment systems online (and arguably no really good payment systems at all).
The closest real-world example is probably AWS, where customers pay directly for the resources they use. But those costs aren't then directly passed onto the user.
Having said that, you could provide a central service where people would buy credit to be used on many sites. So the micropay isn't the problem.
That central service is going to lock out many countries and regions as well as lots of people (minor, unbanked, poor, etc.) in non-locked out countries and regions. Payment is frigging hard especially on the international scale. This is every bit against freedom of information and strictly worse than Cloudflare.
I doubt that many would attack those solutions if they actually worked well, but they don't. These "solutions" are a big part of the reason why the web gets smaller for me every day as more and more websites become unusable.
Cloudflare is very much anti-internet. And I'm a very security-obsessed person. Just like Reddit I believe we need to dial things a bit closer back towards chaos like a venn-diagram (safety)[x](chaos) there's a balance and I believe the internet is worse off when this balance is out of wack.
There might be some awful stuff on sites like 4chan but it also generated a ton of the memes that later filtered down into mainstream internet culture. Culture and innovation often happens in the chaos and fringes, which is an area I believe the world is becoming completely intolerant of in some attempt at idealism. But there are real sacrifices in between (ie, the mostly harmless stuff getting tagged as bad guys).
We need to be better at calming down and embracing the chaos, pushing back against FUD, and maintain a good balanced default. That chaos and flexibility is what originally made the internet great and endlessly promising.
Based on the various posts I've seen from Cloudflare founders on here I'm not convinced they are taking this problem as seriously as they need to be.
However, I see people complaining about Cloudflare in lots of places other than here. The number of people adversely affected by Cloudflare is not small.
Regarding Cloudflare, a regular user will have no idea about what Cloudflare is and what they do. If something like the OP happens to them, they will just figure “the site is broken” and move on. So there could be a large hidden number of users who have suffered from overzealous Cloudflare blocking without being able to identify it as such.
My solution more and more is to just not bother with it. If a site is unreadable because I'm using uBlock and uMatrix, and I have to spend more than a minute or two tweaking things, then I just leave.
That said, I don't have any problem with Cloudflare. I'm much more annoyed by the overuse of *.googleapis.com. I'd love if somebody would setup a service that I could point my hosts file at so that googleapis.com silently went somewhere else.
I think it would work fine with versioned libraries, fonts, etc. I'm thinking of setting up a container and squid config to achieve this.
Any obvious problems or alternative solutions?
Obviously enumerating the worlds CDN URLs would be a task. But I think even covering the most common CDNs would be a benefit.
Wrote a little post about how I configured my blacklists and whitelists with AdGuard Pro for iOS.
In other words: when someone demands "real solutions", they're typically expecting a degree of solution that quite likely just does not exist at all, to solve a problem that isn't as severe as people believe, just because that's the bar that those companies have set in the public discourse.
This makes it impossible for well-intentioned people to 'compete' with these services, because whatever alternative is suggested (hidden form elements, a random VPS provider with DDoS mitigation, serving assets locally, etc.) is immediately dismissed as "that can't possibly be as effective / effective enough", even though it'd be perfectly adequate for the vast majority of cases.
The alternative and competitive solutions exist, and have existed for a long time. You don't need a 1:1 replacement for these services. People just often refuse to believe that the simple alternatives work, and won't even bother trying.
(For completeness, my background is that of having run several sites dealing with user-submitted content, including some very abuse-attracting ones.)
They are immediately dismissed because I don't want to pay a fulltime engineer to play cat and mouse with skiddies on the internet.
CF has problems, but pretending it isn't solving a real issue that is nearly impossible to fix otherwise, especially for individual admins running a side project, doesn't help anybody.
And you are cherry-picking poorly sourced anecdotes to better suite your position.
A VPS with 100Mbps virtual adapter physically can't withstand DoS from single attacker with fiber connection (or equivalent of it). This does not have much to do with anatomy of DoS attacks, just simple math.
Cloudflare subsidizes their free users by giving a bit of bandwidth for free — the amount, that can be purchased from a decent hoster for several hundreds dollars. Of course, an attacker with several hundreds dollars can easily rent a botnet, that will demolish that "protection".
"All Cloudflare plans offer unlimited and unmetered mitigation of distributed denial-of-service (DDoS) attacks, regardless of the size of the attack, at no extra cost."
Do you know of an example of an attacker "easily demolishing" Cloudflare's free DDoS protection for a website with a few hundred dollars worth of botnet?
I can name dozens of websites, that folded under Cloudflare's supposedly flawless DDoS protection (at the time when they were still using it). Of course, the ones who fold are always websites themselves — Cloudflare itself is never affected, because when the DDoS gets particularly bad, they just detach websites from their CDN and expose it to attackers.
If all those companies are fronts for various parts of the US intelligence community then we're really screwed, I suppose.
DDoS attack: If possible, the easiest solution is to just swallow the traffic. If that doesn't work you want to block all networks that allows IP spoofing. Then it's a wack-a-mole game. And if you have the resources, use any-cast and many co-locations. Or ask your ISP for help.
Hiding your server: Use onion address via TOR network.
SSL certificate: Use Letsencrypt
Edge SSL/DNS/CDN: Use a fast web server or proxy, like Nginx. With Cloudflare the connection to the Edge server might be faster, but time to first byte (on your site) often slower. So you get better bang for the buck by optimizing on your end.
Note that DNS by itself already have edge caching out of the box, for free! eg. if a user looks up your domain, it will be cached both at their ISP and LAN. So you don't need Cloudflare for DNS.
What percentage of traffic on the long tail of 95% of smallest websites served by CF is malicious then? So that we talk in numbers.
An unusual UA is unlikely to move the needle on top line metrics, but it is a distraction and a misuse of resources to play cat and mouse. (Unless your business would be materially harmed by someone scraping your data... in which case, you’re doomed anyway.)
The old recaptcha which did not need js, did not serve you with unsolvable challenges, and did not refuse to serve you because you used tor/because you used the audio challenge too much.
The government has passed laws to allow itself to be sued under certain circumstances. The Federal Tort Claims Act (FTCA), for example, allows suits for a variety of torts.
I believe (but am not actually sure) that most normal business-type transactions with the government are covered under FTCA or other acts, so a breach of contract by Cloudflare-the-government-entity would probably be pretty much like a breach by any random non-government entity.
Still, if you were going to depend on that it would be a good idea to actually look into the details of the FTCA and other such acts and compare to the actual Cloudflare TOS.
I have no idea whatsoever how sovereign immunity works in the case of a corporation chartered under some state's corporate law (Delaware in the care of Cloudflare) that is owned (fully or in part) by the government. I'd guess that it could only possibly apply if the government owns enough of the company to have control.
Cloudflare is public, so we can probably not worry about that scenario. If the government actually controls them, it is doing it surreptitiously, and so even if sovereign immunity should be somehow applicable I'd expect that the government would not bring it up because doing so would necessarily bring to light their control.
But recaptcha bas been broken for years now by several different means. At this point, it is so broken it's almost a scam (and just another way for Google to get personal data from as many website as they can).
This is acknowledged by the original question
>> Also, how would your implementation differ to solve this issue?
It's hard to consider simply viewing content to be malicious or abusive, no matter how automated.
I'd love to agree with you, but the crawler problem is 100x worse today than it was a decade ago
Cloudflare have a long history of supporting those malicious actors, so it's not like the problem is unrelated to the purported solution.
I agree with the first two sentences, but disagree with the third. I believe that this state is actually the intended end goal.
Nowadays, not only is this totally impossible, blocking even a subset of a site's JS (such as through uMatrix) is trial and error to get the site to load at all, or to do simple tasks like click on a "login" button.
With Google's plan to "phase out" cookies , I expect the web to become even more opaque and difficult to modify "on the fly" -- that is, on the user's local machine prior to displaying the content. In particular, this will affect ad and tracker blocking the most, as the pain from effective ad blockers starts to bite harder and harder.
So, when you say "The giants gain more invisible power", that is true and desireable from their perspective, and since they write the code that actually underpins most web browsers, why wouldn't they?
When you say, "powerusers suffer decreates productivity/privacy", yes, that's absolutely true. Why would they care? It's such a small fraction of their business. Some users will go to more and more extremes to preserve their privacy, eventually accessing only some small subset of sites from an esoteric Kali-derived distro, and others will capitulate and shift their behavior back to the herd.
In the end, the giants still win.
Rebooting the web into something else is still possible and something that will eventually happen when enough people are too tired about the current state.
This battle has been lost long time ago.
- To do so you need to avoid making user experience worse
Disclaimer: I was part of hCaptcha team.
https://hcaptcha.com/ is competing with ReCaptcha. It's a drop-in replacement for ReCaptcha.
It's privacy focused (supports privacy pass), and is fair: webmasters get a cut for each captcha that is solved correctly (they can choose to directly donate it to a charity of their choice), hCaptcha get a cut for running the service and a customer will get their images/data labeled.
Google/ReCaptcha is another thing. I have hard time understanding any reason to put captcha on any site that a normal incremental delay between login tries & ban sources that keep doing that for too long wouldn't already prevent. They're getting traffic data and ML training data and neither one is required for the thing captcha is trying to solve. Sites are just feeding their business and captcha is actually making the internet worse place for humans.
(Captcha requirement for things like posting on a discussion could be handled by simple spam/bot detection, captcha is just overkill)
EDIT: Another user posted this below, answering my question:
sgtfrankieboy 1 hour ago | undown [-]
In CloudFlare go to "Firewall" and then click Settings on the right.
Here you can set the Security Level and if you want to use Browser Integrity Checks among other thing
Reputable VPN services do a good job of keeping their IPs off blocklists. Occasionally I'll get blocked, because some jerk has been abusing the VPN server that I'm exiting from. But if it doesn't resolve promptly, I just switch to a different exit server.
So I only use this VM, and this VPN exit, as Mirimir. And given that, I don't go out of my way to prevent tracking. Not enough, anyway, to trigger blocking. Because I don't really care if everything that Mirimir does gets linked. Indeed, I pretty much always use "Mirimir" as my username, or sometimes "Dimi" or whatever.
If I don't want stuff linked, I use a different persona in a different VM, using a different VPN chain. Or that via Tor using Whonix.
Would sites then move to this and reduce the lock in and inflexibility with ReCaptcha?
I won’t link, but search “ReCaptcha solver” and you’ll find plenty.
It highlights just how broken the system is. It doesn’t stop determined spammers/devs, until the value of the task is lower than the cost to solve.
Considering it’s 50c USD per 1000....
Adding a short timeout eliminated my contact form spam. I also only allow JSON on the back end, so they must execute JS to even have a shot.
This has allowed me to avoid blocking TOR exit nodes... So far anyway.
For example the botnet traffic is best stopped at the origin. If there was some pressure for the service providers, I'm fairly sure they could do more to detect subscriptions with compromised devices and take appropriate actions. Actions can include educating the users and if necessary, blocking the subscription until problems are fixed.
While this certainly would not immediately cover the whole world, it would be a start. On the website level you could then treat traffic from networks that have agreed to cut malicious traffic in different way.
Because I freaking hate those captcha puzzles...
The underlying incentive here is that centralized websites are slow and vulnerable to DDoS. Massively mirroring their content is the solution, and it's what Cloudflare does. Let's do it in a way that protects human rights rather than taking them away.
GP already mentioned why #1 is not a solution (which I see the same way) and OP made it quite clear that not visiting CF sites isn't quite working either.
That would also explain switching to Chrome fixes the issue.
Most people building custom browsers are doing it to do something Chrome would disallow. One instance would only supporting one, weak-ish cipher forcing TLS to use a predictable cipher instead of choosing the best available encryption for transit. While I agree some people have cool browser projects that would be nice to use, it's a side effect of bad actors abusing the system. Most of the annoying parts of Cloudflare exist because bad actors have abused the system.
It's maddening, but it's true. I've seen tale of people having to modify resource auto-generators that created URLs with hexadecimal identifiers in them because the sequence "ad" in a URL would trip ad-blocking browser plugins. You might ask yourself "how many ad companies worth their salt have 'ad' in the URL path?" and the answer is "The ones who are worth their salt might not, but the ones who are terrible do, and they're probably terrible at other things too, like letting malware on their network."
I went to school at a place that had a policy of soft-blocking network access for any machine that a portscan detected had TCP or UDP 12345 opened, because Back Orifice defaults to that port and people who built trojan horses to allow remote access didn't change the default. It caught a reasonable number of owned machines every year.
Don't overestimate criminals; if most were good at being criminals, they could be successful in society without having to break the law. ;)
Completely possible to work around of course, but it does increase the effort level quite a bit.
Chrome is not, and isn't meant to be, DRM. There are DRM extensions for that, but Chrome (and let's extend this statement to any other whitelisted browser) does not try to limit what you can do to a website. The only restriction I can think of is the common ports thing, but if you want to connect to port 25 (typically for SMTP/email), go ahead and change the about:config setting and you can do it.
You would be SHOCKED how many bad actors use an outdated UA or some random string they think is funny. This portion of CFs mitigation isn't meant to be hyper-advanced detection, just bounce out the low hanging fruit. They have other security services that aim to mitigate the more advanced stuff (like the WAF).
Because if not, what you're describing is a cartel colluding to keep the market controlled by oligopolies. Regardless of whether there's a good reason for them to do so.
So Cloudflare is intentionally breaking the web? Good to know.
Is this such a novel thing to look for outliers in web traffic and offer ways to mitigate risks?
Since your User-Agent string probably starts with "Mozilla/5.0" anyway despite not being Mozilla, you might as well just set it to Chrome's despite not being Chrome. (Vivaldi did that for other reasons: https://vivaldi.com/blog/user-agent-changes/)
It's laughably adorable to think it's actually solving a problem or helping in any way, the 'bad actors' it's trying to prevent probably have a work around anyway.
They do—changing your user agent is trivial.
To the target market, who have a spam issue, cloudflares protection sounds great, and by the time they've set it up, they won't switch CDN's just because it isn't effective enough.
Won't protect against scrapers that execute JS (like ones based on headless browsers -- This includes some modern search engines!)
If a botter really wants to it's easy to get emails scraped. But they don't care. The demographic of people having obfuscated emails on their page via Cloudflare (since you probably don't know every obfuscation solution out there you target the big ones) is also the demographic with a good spam filter (or just using Gmail).
Botters don't care about everything small. If you're bigger you do get better ones who probably specifically target you and then you have more problems then just having your email stolen.
The 99% solution from Cloudflare is complex enough to not get botted by shitty wannabe hackers.
A better solution would be to change the email system to hinder abuse.
That's a pretty clever way to both deter bad actors and ensure legitimate users get uninterrupted access to websites.
What you're talking about is more like what Hashcash does, where it essentially replaces CAPTCHAs with a cryptocurrency miner, such that bots become more expensive to run due to the amount of energy they consume. The downside is it's not great for battery life for regular users either.
I assume it's somehow redirect-related and that's why these sites tend to trigger it.
- you tapped the “Siri suggestion” result which completely skips the SERP. I hate that “back” doesn’t bring you back to what I typed in the search/URL bar
No they don't.
I could sign up for the ten most popular micropayment services and the fees would be about the same as if I signed up for just one.
It exists just as much as your government monopoly. We're talking about a world where sites let people pay microtransactions, not the current state of the real world. I'm just pointing out that it is definitely not necessary to have government control to have really small transaction fees.
> Does it cost anything in time and effort to use them?
Whatever the government version you proposed would do, let's keep it simple and make them act exactly the same.
> Who do you trust to tell you which micropayment services are good and trustworthy?
I dunno, who tells you that visa or stripe or kofi is good and trustworthy? If creators flock to a site, they'll draw in users.
Shameless shill: Qutebrowser is by far the best browser I've every used. The half measure of using addons (even powerful ones like Pentadactyl) cannot be compared to having a browser that is power user friendly in every aspect, from config to UI. If a site doesn't work well with it then I'm probably not going to use that site. If I can move away from Google then I can find your article/post somewhere else.
[Nitpick: it's Tor not TOR]
One more step
Please complete the security check to access <whatever>.
Wasn't HN also behind CloudFlare? Looks like that changed, but maybe it will be again in the future.
As for the support@ not being on those error page; decent feature request. I image the reason they want to avoid that is many of these errors are delivered at request of the site owner or related to the site not working (404s, 503s, IP firewall blocks, etc) so they do not want to funnel people into Cloudflare support for issues that are not specific to Cloudflare.
Determining which errors are the site owners responsibility and which errors are Cloudflares responsibility can be quite tough.
"many of these errors are delivered at request of the site owner" For those, put the site owner's contact method there. Even a physical mailing address, fine by me, I'll send a letter (something a spammer would not do) if it's important enough to me to do so.
"or related to the site not working (404s, 503s" those pages don't deliver a Google CAPTCHA or don't say "You have been blocked". If they can determine whether a page should have a captcha and/or that text, then that if statement can also include showing contact info.
The captcha page, sure, maybe. I can't think off the top of my head what would happen on that page that wouldn't be related to Cloudflare/reCaptcha. I yielded that is a decent feature request. But plenty of actual interstitial pages served by Cloudflare aren't necessarily caused by Cloudflare. Like the fact you get a Captcha at all isn't Cloudflares choice most of the time, it's the site owners. And having firstname.lastname@example.org on that page would 100% cause people to write in saying they don't want to see captchas. That's not the appropriate party to reach out to requesting to stop seeing captchas for a specific site. Now, SOMETIMES it's an automated incident because of your IP, so then you DO want to reach out to Cloudflare.
Same with 500 series errors. Sometimes it's the website not responding, but sometimes it's Cloudflare not interacting properly.
So yeah, I think the truth of the matter is in the middle here. In terms of priorities, I have no doubt this is pretty low on their list. Why would it be any higher when they serve the technical purpose they were created for? The rest of that is QoL with minimal impact on customers compared to many other issues that go wrong with the network that have considerable impact on customers and visitors.
It has been a long time since I used CF though, so maybe there is a question in the setup phase or only a few settings.
I thought the point was anti-DDoS by just proxying your traffic through someone with bigger pipes. That they do TLS offloading to filter n-days like Heartbleed helps as well of course, but those are super rare events and it sounds like what you mean is ongoing.
What kind of bad actors do you mean, and what kind of sites? Don't have to mention the domain or anything, just that it makes a difference whether it's a web shop (financial risk I guess), more like a forum (spammer / hater risk), or something else.
But a layer 7 DDOS attack, when going through Cloudflare, means the malicious actor needs to have IP addresses that are at least not complete trash in terms of IP reputation. Getting access to a botnet and access to these IP addresses isn't exactly prohibitively expensive, but it's a much larger barrier to entry.
It's even harder to get taken down by a layer 7 DDOS attack on Cloudflare if you use "im under attack" mode, assuming your attacker isn't paying even more for the botnet to run something like Chromium or node to hit your website.
Finally, while Cloudflare doesn't actively do this for small-scale DDOS attacks (since it might just be a spike of users), they do have Gatebot for the larger scale ones https://blog.cloudflare.com/meet-gatebot-a-bot-that-allows-u....
I am a normal user though and I (thought I had) absolutely no way of viewing your content now if you use cloudflare.
Turns out using Chrome works...
Cloudfare has passed that point a while back. They have so many policies, departments, shareholders and government departments to please that it is now impossible for them to be a truly good force the open internet.
Obviously this is all just an educated guess, since I've worked on building scrapers for cloudflare-protected websites.
CloudFlare is ruining the internet for me (2016)
And the ensuing Hacker News discussion a few months ago:
It's kind of like saying "it's your own fault you didn't de-select the "track everything I do" checkbox on our privacy page".
Here you can set the Security Level and if you want to use Browser Integrity Checks among other things.
edit: do not follow my advice
Short TTLs are not honoured by everyone so you'll experience some downtime.
Can't conclude much from this; Sitetruth has been reading sites openly for years in a well defined way from a well known IP address, examining them for ownership info about once a month. Although it looks at millions of sites, it never hits any one site very often. From Cloudflare's perspective, that's harmless.
I keep trying to live without JS but so little of the internet works.
* Gitlab/Github (obviously)
* Google maps (obviously)
* Linkedin... (uh... less obviously)
* Rust docs
* Google Cloud Docs
A lot of the internet is butchered without JS.
I really want a way of just blocking third-party JS (IE; the site can deliver JS, but not anything it tries to import unless whitelisted). But that seems to be hard with qutebrowser.
FWIW uMatrix apparently has a method doing this.
That said, I do not use many of the obvious mainstream sites - e.g. I ditched Github like dirty socks the moment Microsoft grabbed them.
But yes, modern web (not the Internet, mind you) is very damaged, and I fear it will take decades to fix the damage, once (I hope) smarter people take the reins after high-visibility security and privacy incidents become more and more frequent, and, well, more visible to general public.
He is willingly using Google because the provide amazingly useful services completely free of monetary charge. Are you objecting to the fact that Google benefits in some way by providing this service?
I'm guessing it's very basic checking because deep browser-fingerprinting is supposed to against the law in some countries (I stand to correction on this statement).
I'm not personally a fan of CF because of the amount of data they can potentially obtain(or do), but there's a lot of crap out there and their firewall is robust enough to protect Johns Cowboy store from contributing to some dudes Monero mining botnet.
It's like complaining that airport security checks your bags when you have gun-shaped objects inside it.