
Cloudflare, Google, Bing Destroying the Infrastructure of the Free Web - dcassett
http://gigablast.com/blog.html
======
calmworm
The article has no substance at all. It’s like 150 words and just describes
Cloudflare a bit and explains how captcha works.

Not what I’m used to seeing on HN frontpage.

~~~
johnklos
Yet you respond in the fashion of a typical fan without any substance at all.

It's a legitimate complaint about a company that doesn't care about the
negative impacts of their decisions. Do you disagree with the facts of the
complaint?

~~~
calmworm
The facts of the complaint are that Google, Bing, and Baidu are secretly
running or maybe influencing Cloudflare and they purposely implemented a non-
accessible captcha to inhibit the smaller crawlers ability to build their
index ... no, I don’t agree. Not based on a blurb that contains zero evidence
or sources.

------
polyomino
Very click-baity title IMO.

I think they're right about Cloudflare making legitimate scraping more
difficult. There's little reason to believe that it's to protect the big
players in search though.

------
MauranKilom
> Now, if you don't retain that cookie for each subsequent access to that same
> website from that same IP address the delay is repeatedly increases until
> like around 30 seconds at which point you will be presented with a Turing
> test.

So retain the cookie they give you? Is that a problem?

(Honest question, I have not dabbled in writing crawlers...)

~~~
core-questions
> So retain the cookie they give you?

Ahh, but the cookie is created client-side using some Javascript which does
some computationally intensive stuff for a second. Doesn't bother you as an
end-user, but if you're writing a crawler and you're not driving a headless
browser (expensive) then you probably don't trivially have the ability to run
arbitrary Javascript code (or else, you have the work of integrating Deno or
something to do that part for you).

Either way it means you can't just curl the webpage and get it. That's
obviously the point when defeating DDOS attacks is the use case, but it
doesn't work for crawlers, many of which are legitimate "users" like in the
article.

These services should offer some other easier proof-of-work mechanism.

~~~
cnst
> computationally intensive stuff for a second. Doesn't bother you as an end-
> user

Actually, it does. They waste the CPU cycles of my devices for absolutely no
good reason. It's very environmentally unfriendly as well. It shifts the cost
onto the end user, not a very nice way to go about it. They probably don't
describe the drawbacks to their own customers, either, so, everyone simply
opts-in thinking there's no drawbacks; but the end result in a diminished user
experience and tonnes of extra CO2 emissions throughout the world.

~~~
core-questions
I agree 100% and wish there was a better mechanism to prove you're not an
attacker, but it's hard to think of one that isn't annoying like a traditional
CAPTCHA is.

~~~
cnst
The bigger question that noone's asking is the cost to generate the page:

Does it take them 1 second of CPU time to generate the page?

* If not, isn't that a disproportionate amount of time for the client to do some silly throw-away work?!

* If yes, why don't they improve their infrastructure such that static pages could be properly cached as they should be, and a slightly stale versions could be served to everyone at a lower total cost than if you require even a few select users with "abnormal" parameters to solve the captchas?

At the end of the day, all these DDoS protections are placed in front of pages
that by all accounts should be cacheable static pages, which should take less
time to produce and consume than the repeated 5-second JavaScript captchas
that they replace these static pages with.

The underlying issue is that one solution could be sold as a standalone one-
size-fits-all product, but the other one can not, so, that's why we have to
face daily disappointment if our browsing setup is "abnormal" in any way.

------
joeraut
> You have to be a non-blind person to pass the Turing test, as Cloudflare
> does not offer a handicap option.

(Edit: not sure if the above is true.)

I was quite surprised to see this. Much effort has been put into making the
web more accessible; it’s a shame if an otherwise accessible site is blocked
behind a non-accessible captcha wall.

~~~
calmworm
I don’t even think this part of the article/blurb is true.

Edit: I stand corrected, looks like Cloudflare recently moved to hCaptcha,
which does not offer an “a11y” option.

~~~
h3h3
"How it works: first, an accessibility user signs up at this URL, which is
linked in the hCaptcha widget info page. They are given an encrypted cookie
that can be used several times per day, but must be refreshed every 24 hours
via login."

[https://www.hcaptcha.com/accessibility](https://www.hcaptcha.com/accessibility)

~~~
kiwijamo
How on earth is this considered a reasonable accommodation for people with
access needs? Stinks of something created with no consultation whatsoever with
the accessibility community.

------
judge2020
hCaptcha (the service CF switched to after Google decided to start charging
for reCAPTCHA for large-volume customers[0]) has an accessibility option that
bypasses their captchas, and it's available at:
[https://www.hcaptcha.com/accessibility](https://www.hcaptcha.com/accessibility)

0: [https://blog.cloudflare.com/moving-from-recaptcha-to-
hcaptch...](https://blog.cloudflare.com/moving-from-recaptcha-to-hcaptcha/) (
[https://news.ycombinator.com/item?id=22812509](https://news.ycombinator.com/item?id=22812509)
)

~~~
dudus
What large players did Google charge for recaptcha other than cloudflare?

~~~
judge2020
I don't know of any specifics, however the pricing change is public:

[https://cloud.google.com/recaptcha-
enterprise](https://cloud.google.com/recaptcha-enterprise)

[https://www.google.com/recaptcha/about/](https://www.google.com/recaptcha/about/)

> Free up to 1 million Assessments / Month

------
mmaunder
Sure the author may not win a Pulitzer, but the point re accessibility, and
the big SEs being stakeholders, creating a moat for any new crawlers, is
valid.

------
johnklos
As others constantly show us, illegal is only illegal when someone actually
enforces the law. Even if it weren't illegal, if it only impedes a small
subset of people, or if it impedes people without money / resources, then
Cloudflare won't give the tiniest of a damn.

Their record is clear on this. They only care about those who give them money.
Any attempts to give you things for free is to bait you in to becoming
dependent on their platform.

------
vikramkr
This is a weird article - it looks like its pushing a conspiracy that cloud
flare is secretly an agent for Google bing and Baidu simultaneously?

------
jyrkesh
As others have said, this article is mostly nasty rhetoric with very little
substance. I'm surprised that the CAPTCHA isn't accessible, but that's about
it.

Websites opt-in to CloudFlare DDoS protection. If you want to be crawled, you
don't have to use it. But it's very difficult to expose yourself to the open
internet nowadays unless you're hosted in a cloud or have something like
CloudFlare.

I have stuff I don't want crawled at all, I use CloudFlare, and it's an
awesome (free!) service that helps me maintain HTTPS certs and keep Chinese
and Russian IPs from hammering my server.

~~~
johnklos
Nasty? Methinks your reaction shows you to be a fanboi who doesn't like
someone speaking truths about the object of your fandom.

~~~
mcdoogal
Not a very effective retort when GP lays out their thoughts and justifies
their position...

------
kgraves
What is it with this pseudo-doomsday clickbait "the free and open web is dead"
type posts I keep seeing?

Is the web really dying or being destroyed? I don't get how it is and this
article doesn't explain this either.

------
brlewis
Is there no link to the post apart from the top-level link to the blog?

~~~
shakna
The post has a name attribute (cloudflaredestroy), so you can generate a link
(a href='cloudflaredestroy'), but you can't have a direct URL to it, as it
doesn't have an id.

------
greatjack613
The solution is simple. Can some cloudflare engineers here grant this search
engine the same access it gives the chinese ones?

~~~
zerotolerance
They failed to present data that exceptions are made for other search engines.

~~~
greatjack613
Valid point, but I do not think they are lying, I am sure they can present it
to cloudflare if required.

~~~
joshuamorton
It's quite likely that Baidu is doing the reasonable thing and using a more
sophisticated crawling mechanism to cache the cookies. You have to do this
anyway to be able to crawl much of the web which is highly js-dependent.

------
monkin
So some bizarre and absolutely insignificant search engine is ranting about
being irrelevant? Good for them.

------
bxwalters
Google seems to be destroying itself, fortunately. The search results got so
bad after the recent changes that for the first time duckduckgo is superior,
even for technical searches.

I'm slowly making the switch now.

