Hacker News new | past | comments | ask | show | jobs | submit login
Cloudflare’s CAPTCHA replacement with FIDO2/WebAuthn is a bad idea (herrjemand.medium.com)
223 points by herrjemand on May 14, 2021 | hide | past | favorite | 284 comments



The way I'd put it is that Cloudflare's suggested implementation may have its issues, but the general idea of trying to verify that someone is a human and then providing this verification to services in a way that is 1) anonymous and 2) cross-compatible with other services, is the correct way to go about things (or at least has some very appealing features).

I hope that we have something in the future that does this job very well so that services do not need to verify phone numbers, Google accounts, and even IDs and facial imagery just to allow someone to use them (as this is much easier to do than coming up with new captcha styles that humans can quickly and easily solve, but that basic machine learning and scripting cannot).

Being able to use the Internet with the slightest bit of privacy is already ~impossible for the average user and extremely difficult and tedious for very knowledgeable and experienced ones, so anything that tries to improve the current trend sounds like it's at least attacking a problem worthy of our attention.


Alternatively, if services demand a fee then there is no need for human verification.

Instead of trying to solve anonymous human verification we can as well make micro-payment an option.


Sometimes serving 429s/403s to unauthed users is already costing you too much in egress bandwidth bills. That’s one of Cloudflare’s main propositions: stop that “idiot bot that will never get what it wants, but keeps requesting it anyway” traffic outside your network. (Note: not the same as a DoS! Usually not intentional, and usually not actually bringing your infra down. Just costing you money, while not making you any money.)


Is there a reason Cloudflare can't send a 429/403 in that situation if you're already using it as a middleperson?

It seems like responding to a bot and responding to an unpaid accidental request would take the same amount of energy from Cloudflare. The only difference is that a repetitive unpaid request is probably easier to detect.


Oh, they totally can, but you have to get Cloudflare to be your auth gateway (using e.g. https://www.cloudflare.com/en-ca/teams/access/ .) Usually companies who have this problem already have an auth system in place and well-ossified (i.e. with customers who’ve burned assumptions about it into software like mobile apps, that can’t be “forced” to update.) So it’s pretty hard to do a migration.


I really hate it when I'm trying to spend money at a company and get hit with a captcha box right as I click "checkout". I could see it for selling scarce items like concert tickets, but in general it's very insulting, annoying, and off-putting to me.


Credit card fraud that results in chargebacks is a very significant cost to a lot of online stores. So while it does suck it isn't the shop that is to blame.


A small micropayment makes for a great way for bad actors to test stolen credit card numbers.


You can't do micro payments with credit cards, they have to much $$ overhead. You'd need to pay into a service that'd handle the micro-payments and they'd have minimal packages to buy to mitigate these issues.


Once logged in perhaps. But credential stuffing is a thing.


This general direction could be good if implemented utilizing some zero-knowledge technology (zkSNARKs and closely related families are the most commonly applied recently but ideally it'd be something that doesn't require a secure setup).

Intuitively, you would provide privacy-preserving offline attestations that e.g. one of a set of trusted parties has verified that you're a legal resident of jurisdiction X or over Y years of age, without needing to disclose any private identifiers.


Cloudflare captchas in particular, and any checks and roadblocks to see something publicly available in general, are terrible, period. It doesn't matter which form they take. Every time you see one you feel like a second-class citizen and get reminded that the internet is no longer what it used to be.

I personally simply close the tab when I see a cloudflare "one more step" page.


This is completely wrong. Site administrators can put any controls they want in place to limit access. I don't know where you get the idea that things on the Internet need to be publicly available or without restriction.

Unless you're an original ARPANET contributor, there have always been attempts to control access and stop attacks. You're making the same mistake every conservative does. Longing for a nostalgia that never existed.


> I don't know where you get the idea that things on the Internet need to be publicly available or without restriction.

That's not what they said at all. They complained about breaking the access to things that were already chosen to be public.


what? Have you ever dealt with a DDoS attack and the consequences on your availability and infra health?


Of course from the perspective of the website operator it’s great, but from the perspective of the user it’s frustrating.

I’m not sure whether this is true, but it seems like with Firefox I get these captchas much more often than with Chrome. Sometimes they’re so difficult to solve it really takes a minute or two to do so, and it’s incredibly disturbing / an unfriendly interaction.

Surely there must be a better way to deal with this? Why do I have to keep proving again and again and again to Cloudflare I am, in fact, a person?


Because you don’t actually want Cloudflare to store your browsing history on their servers. If they did it would be an insane privacy concern. So every time you get there it is like the first time.


Are ddos attacks a common enough occurrence to warrant putting half the internet behind ddos protection? In my impression you need to do something really wrong to deserve one.


That's unfortunately not true at all in my experience. Maybe if you're an anodyne SAAS, but if you host any user-generated content, especially if it's adjacent to gaming (my personal experience was mostly with gaming-related forums and IRC networks), politics or any other charged topic, expect to get hammered on a pretty frequent basis. IoT botnets are pretty easy to rent at this point, so the attack is accessible to every skid known to mankind.

I actually agree with your overall point as I try to use Tor for a lot of "normal" browsing, but I'm not sure what the correct solution to accommodate both is. It's a hard problem, and having been in that position myself I have a hard time faulting small website operators who have no alternative defenses.

e: just to add to this, I see the existence of ddosing as a significant driver towards centralized monolithic services. If your blog on Palestinian rights or whatever is getting hit, that's an incentive to move it to a platform that takes care of networking for you. It's a little absurd to go all-in on decentralized self-hosting without at least an acknowledgement that with current tech and typical personal-computing budgets, doing so is giving a heckler's veto to literally everyone. Cloudflare isn't the only dimension things can be centralized along.


Yes, attacks on the web are very common for any decent sized site. Just because something is publicly available doesn't mean you get unfettered access to do whatever you want.


Yes, they absolutely are. Hell just getting a few random bots scraping stuck in a loop or being overly aggressive on your site is enough to double your bill. So yeah it's 100% required.


Did we collectively forget rate limiting exists or something?

One bot that's just stuck on a loop or being overly aggressive is going to have one IP.


That is fair but instead of bothering to set that up I can just sign up for a free service in a hour once and just never worry about it.


any sophisticated bots are rotating IPs or pulling from a large pool of IPs. In many cases of fraud/abuse, rate limiting based on IP doesn't work well enough.


> to double your bill

Do you pay a variable amount for your hosting? How and why? All VDS and dedicated server offerings I've ever seen are fixed amount per month. And more often than not the network is limited by speed, not by data transfer.


In that case then the service will just hang as it won't handle the requests caused by even a simple malfunction (not even an actual attack) like the one mentioned by GP.

Mind you, I see your point and I generally don't like the captchas either, but it is definitely a trade-off and I won't blame webmasters that use the DDoS protection.


When you get DoSd you either pay to absorb the traffic or go down. Usually paying a DDoS service is the cheapest option but if you're not, you're paying for more infrastructure (or going down)


How do you mitigate ddos attacks and other bad actors hitting a page?

What does your cdn solution look like?

Route optimization from your (single) endpoint to clients literally half a world away?


As I user, I simply don't care. I repeatedly get punished for doing nothing wrong. It's almost like airport security.

> What does your cdn solution look like?

> Route optimization from your (single) endpoint to clients literally half a world away?

And as a developer, I don't understand this newfangled obsession over CDNs either. Yes, there will be 200 ms RTT in some cases. So what? Get over it. Optimize your website to load in fewer round-trips. TCP congestion control adapts well enough to any latency. RTT only really matters in gaming and VoIP.


that 200ms rtt does matter to users. it becomes very noticeable. especially when you're writing an app, not a brochure site. You need to tree shake so you're not serving a huge spa all at once.

I've see much worse times for users, and a cdn absolutely help with our staff in asia dealing with our internal apps.

Of course they're not always tripping up cloudflare and being shown captchas. I almost _never_ see a cloudflare captcha either... huh...


What use has an app that requires a full RTT fot every buttonpress?


None? Why would every button press require a full rtt?


>that 200ms rtt does matter to users. it becomes very noticeable. especially when you're writing an app, not a brochure site. You need to tree shake so you're not serving a huge spa all at once.

Because this implies that.


No it doesn't. at all.

Page load speeds by themselves can be painful. Open devtools and have your browser throttle to poor 3g speeds.

Try browsing around. even well optimized sites.

Now try uploading a couple dozen files through an api.

This is legit what some users deal with. In New York state even, you don't need to go that far to find poor connectivity.

Even if all your users have awesome home connections, think sales people taveling to a client. or on site inspections of a manufacturer in a warehouse that's mostly metal and has bad wifi.


If you're on 3G I would expect sites to load in a similarly bad way with or without an extra most of 200ms of RTT.


The throttling in dev tools is meant to represent that latency...


If setting it to 3G is supposed to represent just 200ms latency, that's going to give you a very exaggerated and misleading impression of how bad it is. It's a meaningless test.

I thought you were giving an example of how bad connections can get, and saying that the extra latency would make it worse, but in that situation it's a drop in the bucket.


Then what does the 200ms have do with "writing an app"?


Apologies, your username looks like the one that tossed that number out as a what they assumed was a high number.

200ms latency isn't that bad, but I'm seeing more 800-2000ms latencies with some users depending on physical location. at some point latency kills usability. Especially when trying to get through a complicated QA or inventory process.


That latency is literally impossible unless they’re up in space somewhere, beyond satellite orbit. Your latency is most likely caused by CPU on low-end devices. A CDN won’t help you with that (and probably be harmful having to manage another TCP connection).


I few years back I was seeing about 15-30s latency on 2G EDGE. Overloaded links can have high latency due to queuing which CDNs tend to help


Any latency over he speed of light is possible. In this case you’re seeing people in bad places, like rural China, or Long Island...


I see latency spike like this often on T-Mobile LTE. One second your under 40ms ping, then your next ping clocks in at 794ms or 2100ms despite having good signal strength.


> Yes, there will be 200 ms RTT in some cases. So what? Get over it.

You're missing a zero in that RTT for users in places like Asia if your server is anywhere in the west. (It's actually somewhat revealing when someone throws out a number like this without any qualification; what exactly made you conclude 200ms is the magic number?)

> Optimize your website to load in fewer round-trips. TCP congestion control adapts well enough to any latency. RTT only really matters in gaming and VoIP.

You don't need a very big imagination to think about cases where RTT will have significant impacts e.g. in the event you need to issue multiple sequential requests that are dependent on one another. These are unavoidable and occur often in more than just websites, but anything that e.g. uses HTTP as an API (a very simple one is something like recursively downloading dependencies.)

This comes across as a classic "I don't actually understand the problem domain very well at all, but get off my lawn" answer to the problem.


> You're missing a zero in that RTT for users in places like Asia if your server is anywhere in the west.

Well, it does say 130 ms in here: https://www.quora.com/How-long-would-it-take-for-light-to-fl...

And that's around the planet, to go around and end up at the same spot. In practice, with sanely-configured routes, your packets should never need to traverse more than half that distance. So, divide it by 2, then that cancels out because RTT is a measure of how long it takes for a signal to travel back and forth. You then add some time on top of that to account for buffering and processing in the various equipment along the way.

> These are unavoidable and occur often in more than just websites, but anything that e.g. uses HTTP as an API (a very simple one is something like recursively downloading dependencies.)

If you mean REST API requests, the kind that trigger some code to dynamically generate a response, how would a CDN solution like cloudflare help? The request still needs to get to the server and the response still needs to come back, all the way, because that's where that code runs. CDNs only really work for cacheable static content, don't they? I mean it's in the name.

A blog or a news website certainly doesn't need a CDN.


I live in New Zealand, definitely a first world country with first world infrastructure. Ping times to US East or Europe are over 300ms right now from my home[1].

My old business had users in countries around the world, and the assets were highly optimised for speed. However adding CloudFlare (a) significantly sped up our service to clients, especially those in Asian countries, and (b) significantly improved reliability of connections because CloudFlare have their own dedicated network links between countries and/or optimised for reliability.

[1] https://www.cloudping.info/


There's no need to guess based on the speed of light. Test it yourself:

https://www.cloudping.info

For me, the highest was 310ms round trip to Singapore, so higher than your estimate but not too bad.

But this is completely beside the point. As far as I know, if you're using a CDN effectively (i.e. a large proportion of requests are hitting cache), it should be cheaper than having all requests hit your server, not more expensive. So even if you don't "need" a CDN, you might want one. This is orthogonal to the issue of bot traffic, which exists whether or not you use a CDN. If you want to use a CDN but don't mind the costs of bot traffic, you can configure CloudFlare to not show the CAPTCHAs, or use a different CDN.


338 ms to Sydney is my worst. 234 ms to Singapore.

AWS does offer a CDN, right? Somehow they do it without captchas and without me ever noticing. So I'm somewhat right at cursing at cloudflare because it's the only one actually announcing its presence by actively disrupting your browsing.


Cloudfront, the aws cdn, is not equivalent to the part that's showing a captcha. Only when it would have hit your server do you see the captcha, because it's proxying it, not serving cached results.


> Well, it does say 130 ms in here.

If you have a fiber backed, all-switched network with no routing, buffers, congestions, or detours, you may get that value, if you're lucky.

Pinging tty.sdf.org which is a direct access shell service in USA from somewhere between Europe and Asia, from an academic network backbone roundtrips in ~190ms. I'm traversing a little less than half a globe with the whole journey. In your terms, it should be around ~60ms, but it's not.

> If you mean REST API requests, the kind that trigger some code to dynamically generate a response, how would a CDN solution like cloudflare help?

By using Cloudflare workers, so your code is also distributed around the globe?

> CDNs only really work for cacheable static content, don't they? I mean it's in the name.

JS files are also static content. Unless you don't use code distribution like Cloudflare workers, using a simple CDN can cache 90% of your site if not more. CSS, images, JS, HTML, you name it.

> A blog or a news website certainly doesn't need a CDN.

Actually, CDN is the most basic optimization for distributing heavy assets like videos and images, which news websites use way more than text. Why not use a CDN?


> Pinging tty.sdf.org which is a direct access shell service in USA from somewhere between Europe and Asia, from an academic network backbone roundtrips in ~190ms.

Actually I get around 200 from Russia which is also "somewhere between Europe and Asia":

   round-trip min/avg/max/stddev = 196.504/197.581/199.833/1.360 ms
> By using Cloudflare workers, so your code is also distributed around the globe?

Great, let's give that company even more control. That's sure gonna end well.

> CSS, images, JS, HTML, you name it.

It all gets loaded once and then cached in the browser. The initial load takes long regardless of whether there's a CDN. Oh, and many websites also use stuff from like 10 different domains, which doesn't help this either.

And, it doesn't matter whether a JS file loads in 50 ms or 300 ms, if it then takes 5 seconds to parse and start running.

> Actually, CDN is the most basic optimization for distributing heavy assets like videos and images, which news websites use way more than text. Why not use a CDN?

So put them on a separate domain and serve that from a CDN if you really care whether that stock photo no one notices loads in 500 ms instead of 2000. That doesn't explain much why anyone would put their main domain behind cloudflare.


> Great, let's give that company even more control. That's sure gonna end well.

We have enough evil companies who invade our lives through the platform develop. Cloudflare is not one of them. Using them is voluntary (by the service providers), and I think they're one of the more useful companies around.

BTW, I'm not a web developer or Cloudflare employee. I have no skin in this stuff, however they build some cool stuff inside the Linux kernel, which is interesting from my PoV.

> It all gets loaded once and then cached in the browser.

Then cleared and/or invalidated by the user or browser's logic itself due to plethora of reasons.

> The initial load takes long regardless of whether there's a CDN.

Actually, no. A reasonably fast internet connection (>12 Mbps we can say) can load a lot of things very very fast. The biggest overhead is DNS, even with CDNs. With a good local, network-wide DNSMasq installation, if the server is close, I can load big sites almost instantly.

> And, it doesn't matter whether a JS file loads in 50 ms or 300 ms, if it then takes 5 seconds to parse and start running.

I think 5 seconds is long time for even the old Netscape Navigator's JS parser. You need to run something akin skynet to parse the JS file for straight 5 seconds. How's that even possible?

> So put them on a separate domain and serve that from a CDN if you really care whether that stock photo no one notices loads in 500 ms instead of 2000.

I don't know you, but world news generally contains live/new footage or fresh photos from the ground, not some stock photos, also we humans are visual animals. Many people want to see the images first, read the text later.

> That doesn't explain much why anyone would put their main domain behind cloudflare.

Load balancing, DDoS protection, CDN, workers, Bot/Scraping protection, cost reduction, rate limiting, you name it. Even my DSL router implements some of the protections, to my surprise.

Internet is not the same beast now when compared to 90s/00s. I miss the simpler times, but alas.


Of course, light doesn't travel at vacuum speed in fiber, it travels a bit slower, but it also bounces around the cable, so ends up traveling a significantly longer distance. Multiply by around 1.5 for more real world numbers (copper is similar), just measuring raw distance.

And switching adds significant delay, especially when you get out to the edge.

> If you mean REST API requests, the kind that trigger some code to dynamically generate a response, how would a CDN solution like cloudflare help? The request still needs to get to the server and the response still needs to come back, all the way, because that's where that code runs. CDNs only really work for cacheable static content, don't they? I mean it's in the name.

For a simple example, imagine assets that are dynamically loaded based on feature detection in JS. All of the assets (js included) can be cached on the cdn.


Most end users are not in a position to control the path their packets take. Their ISP could route their packets around the world 6 times over if they can save a nickel doing it


I don't think you understand why that captcha is there in the first place then.

Cloudflare prevents a bunch of crap that site operators just don't want to deal with. Especially for smaller sites that are run by one person. Dealing with a wordpress site getting hacked because you missed an update by a day, or a bulletin bored getting swarmed with bots, or some asshat ddos'ing your site because you banned them. Suddenly that site just isn't worth running.

Complaining that about a thing that prevents that headache because it's a minor inconvenience to you is so self centered it boggles the mind.


Yeah, so centralizing the entire internet around a black box that sees all your traffic in cleartext is clearly the right solution. /s

> Dealing with a wordpress site getting hacked because you missed an update by a day

Maybe don't use something this vulnerable then and rely on a third party to protect you from exploits.

> or a bulletin bored getting swarmed with bots

Maybe require email verification and/or a captcha when signing up or posting. Don't punish people for passive actions.

Somehow, there are many forums that aren't behind cloudflare, yet there are no spam bots.

> or some asshat ddos'ing your site because you banned them

Sure ddos is such an everyday occurrence?

I just don't understand. I run a personal website. There's literally nothing to "deal" with. I set it all up once and it works. I only have to pay for the server and for the domains on time.


Monocultures are always bad, but I don't see any alternative services with this level of ease of use.

You're definitely overestimating the technical expertise/available time of a lot small time admins out there.

You don't see bots and spam on those forums either because they are actually using cloudflare, and you're just not seeing the captcha, or because in the backend they're feeding all their posts through akismet (in plain text). I don't think you're considering how many services see your posts, even when you don't trip a captcha.

email accounts are trivial to sign up for, especially for bots. I always recommend charging $1 (or local equivalent) for an account, that's a lot harder to fake.

My point in all this is that bitching that site is using cloudflare to not have to deal with crap, is a self centered view.

Saying "well it never happened to me, so it must never happen" is similarly self absorbed.

Maybe consider that your experience is not everyones experience


>prevents a bunch of crap that site operators just don't want to deal with

>I don't see any alternative services with this level of ease of use.

Both of these boil down to laziness, IMHO. And for how awful they are for half the internet, that's not okay to me. Not as a user or as a developer.

When a user requests a site they expect to get the site. When you utterly disregard their utterly reasonable expectation and force them to train someone's ML algorithm for free for a minute or two you utterly destroy the user experience.

If that's my experience as a user, your entire site gets an instant 1/10 rating from me. I'll likely leave instantly and never return.


> My point in all this is that bitching that site is using cloudflare to not have to deal with crap, is a self centered view.

Who is serving whom here?

If a business thinks it's ok to impose cloudflare inconvenience on me, the customer, for the priviledge of giving them my money, who is self centered here?

The simple answer is I'll close the tab and go buy it from a competitor. I'm not playing captcha games to buy something.


For regular public web pages, serving the actual fucking page should not be more expensive than serving the captcha page! What the hell is a "bad actor" in relation to GET requests to a public page? To a public page, all actors should be inherently neutral.


There's a lot of potential side effects. Not every GET request retrieves a static asset. A list view with filters for instance. Either way, you could be dumping any arbitrary data along with a request. Or just try fuzzing parameters. some pages might be poorly done graphQL endpoints, and your might find your db tied up. There are MANY ways a legit a get request can cause issues, let alone someone with bad intentions.


All users are hostile users and all users are preferred users. Define what your system will allow through rate limits and caching. Assume users would destory your site if given the chance because they will. If you are exposing private data through graphgl config it or drop the private data or drop graphgl and user a backend.


I 100% agree you should be looking at all incoming traffic as hostile or at least potentially hostile. The other comments are contending that there's nothing a hostile user could do to a 'public' page.

One of the things using cloudflare gets you is all those protections without having to know how to do them yourself. Which a lot of the developers don't know how to do.

There's also something to be said for catching a lot of this at the network layer on a cluster of machines that can handle any incoming traffic that your one poor neglected vm can't.


If you use cloudflare you give up freedom not to force captcha. If you avoid them you can choose where you want to captcha. When cloudflare goes down you go down unnecessarily.

If worried about a ddos attack cloudflare or another provider might be a good choice. But adding ddos support by default seems unncessary. In my 20 years of running 100s of sites I haven't run into a situation where I need ddos support. The vast majority will never be the target. Once in awhile google or bing will ddos you but using cloudflare to block that seems like overkill.


Cloudflare gets you very paranoid, overly aggressive, user-hostile protections by default – and people just take the defaults even if they use it with STATIC HOSTING LIKE GITHUB PAGES!


There are tradeoffs, yes.

We might weight those tradeoffs differently. That's ok.


> How do you mitigate ddos attacks and other bad actors hitting a page?

Not sure what "bad actors hitting a page" even means. I host public info so people can see it, be it good or "bad" people. Let them see it.

DDoS is different and can be devastating of course. Also, very rare. In decades hosting content (started my first hosting business in 1994) I've never experienced anything remotely like a DDos. I know it happens, but definitely very rare for most people. Driving tons of legitimate users away with relentless captcha annoyances for the once in a liftime possibility of a DDoS is not a good tradeoff.

If you're in a business that attracts DDoS like flies then deal with that, otherwise lay off the captchas.


> Not sure what "bad actors hitting a page" even means

I would refrain from commenting on this topic then.

> I host public info so people can see it, be it good or "bad" people. Let them see it.

If your site ever gets big enough, you'll understand.


Why is every and any TOR and sometimes VPN user deemed a DoS attack... it discriminates against users who value privacy by forcing hCaptcha on them by default. Worst of all... it could be a de-anonymization attack as well, hence why I as a regular TOR user, just exit the page immediately when that happens.

For any of my pages that do happen to use Cloudflare, I am luckily able to disable this discrimination in the CP so kudos for that at least, but terrible defaults imo.


Because that's a not insignificant portion of traffic they see from tor and vpns?

tor has some absolutely valid and import use cases, but what percent of tor exit traffic is actually someone trying to keep their traffic anonymous from the eyes of an oppressive regime, and what percent are script kiddies, or someone hiding torrenting from their isp?


Are you one of those people that answers the door with a gun, even when you’re expecting a friend?


From experience, traffic via Tor was always 99%+ fraud.


You can conduct fraud by accessing public, read-only web pages? You can conduct fraud by searching on Google?

Those are the two I find repeatedly blocked when accessing via Tor. The former by Cloudflare, the latter by Google.

I use Tor to lookup phone numbers that have just called me, to decide whether it's a good idea to answer. Since I don't want to be personally associated with such numbers I prefer to search anonymously. But often it's impossible to get a result.

Sometimes even spending 5 minutes solving captchas isn't enough. (I'd only spend that long to see if it's just an outlier. No, it's quite common.)

This creates an immense pressure to tell various services exactly who is phoning me, which is a terrible attitude to privacy.


Then don't use sites that are behind cloudflare?

It's not your choice if the site owners/admins use cloudflare. It IS your choice not to use those sites.


In practice the information I'm looking for is behind Cloudfare. There are other sites; they tend to lack the information.

There is no "don't use" if I want to get my task done.

I can choose not to obtain the information, but then I still have the problem I started with.


For the companies I worked for, we usually allowed Tor as read only. But net ops might override that, particularly when things moved to https traffic.


Well, if you keep throwing impossible captchas at them, no wonder that normal users just close the tab, but bots and fraudsters keep trying.


We're solving the wrong thing here. We should be going directly to the source: The mis-behaving (for whatever reason) IP and the ISP that owns it. We should be fostering an ecosystem that solves that problem, not one that results in a CloudFlare popping up. Which proverbially just throws spaghetti (money, technical-expertise and infrastructure) at the problem, and then using that accumulated power to shape (or manipulate) the web to its liking, and to kick-off anyone it doesn't like off it, of course.


Complete aside, but I'm still not certain I understand the technical details of why Cloudflare can't uniquely identify users. I thought I knew how hardware keys worked, but apparently I don't.

If the key being shared is embedded in the device, even in a secure enclave or something, then my understanding was that would open the door for key extraction. If the key is unique per-device, then that's not a problem. But if the key is unique per-10,000 and stored statically on the device, then hacking one device means that key can be released to anyone and the entire pool can be imitated.

So if the above is correct, it can't be that a single private key shared across the entire company is stored on the device because that key would be getting constantly extracted and leaked by some determined hacker somewhere. But if it's a unique key per-device, then... I just can't figure out how validating that key wouldn't require transmitting unique information to somebody, whether it's Cloudflare or the device manufacturer.

Where am I going wrong? I feel like I'm misunderstanding something fundamental about how signing works on these devices, but I can't figure out what it is. If I buy a Yubikey, is it connecting to the manufacturer's servers and getting a new key each time it's used? I thought they worked offline.

Or are secure enclaves just much more secure than I think they are? Are we assuming that it's impossible to extract a private key from one of these devices?


https://fidoalliance.org/fido-technotes-the-truth-about-atte... explains this pretty well.

Basically:

* Attestation keys are not unique per authenticator; they're shared among batches of authenticators.

* If you extract the batch's attestation key, you can imitate authenticators from that batch. That doesn't mean you can authenticate as a registered authenticator, of course; it just means you can pretend to be a "Yubico XYZ" device.

* Yes, I think Cloudflare is assuming it's hard to extract the attestation key. I think this is basically a safe assumption, but if it isn't, they can always choose to distrust batches known to be compromised.


Thanks, that's really helpful. Followup questions though:

- Does this mean if I buy 2 of these devices at the same time, it's possible for me to get the same attestation keys on both devices? I guess depends on how many batches at a time a company is producing.

- Doesn't this mean that attestation keys will get more unique over time as devices from the pool fall out of circulation and become rarer? Are keys rotated to prevent that (ie, would a manufacturer ever re-release a new pool with the same keys as an old one)?


- Yes.

- To my (limited) knowledge, yes, you are right that keys will get more unique over time. That's a very good point. Keys are not rotated nor (generally) are they rotatable; they are usually read-only. If you are using a very old FIDO device and worried it has too much entropy now--like, if it's a "rare" or "vintage" device!--then you should buy a new one, I guess?

(I honestly have not thought about your second point before, but I am not really deep in the FIDO stuff. So take my answer with a grain of salt.)


Manufacturers don't necessarily have to rotate the keys on older devices; they could rotate the keys on newer devices such that it's difficult to reliably tell what batch/generation a newer device is from, because it could be using a newer or older key.

Such behavior would require some way of revoking old keys from newer devices to prevent a situation where a compromised and blacklisted old key is selected and causes the CAPTCHA to fail, seemingly at random.


I don't totally follow. The key is baked into each device, so if you sold a mix of devices where some had the old key and some the new, revocation of the old key would brick the new devices with that old key.


I understand from [0] that the attestation key is shared across all instances (SNs) of the same model (PN): "...For example, all YubiKey 4 devices would have the same attestation certificate; or all Samsung Galaxy S8’s would have the same attestation certificate". So you would not need to to buy them at the same time.

But of course, despite this, still a unique key is generated for each identity upon sign up [0]. I am not sure (as in 'have no knowledge of') the entropy for these devices.

[0] https://fidoalliance.org/fido-technotes-the-truth-about-atte...


> they can always choose to distrust batches known to be compromised.

Which effectively means bricking the devices of 9999 innocent users each time.

Why are we creating a world where users will be told they can't visit a website or access their account any more because they didn't spend enough money on a hardware DRM device which tries to hide a key from them?


Hmm. I'm of two minds about this.

Vanguard restricts (or used to) trusted keys to those made by Yubico. This annoys me, given they let me use any phone number. So this suggests (as you seem to be arguing) that attestation is an attractive nuisance—when you can't tell whether a phone number is with a trustworthy or untrustworthy exchange, teleco, or VOIP service, you just have no choice but to trust it. But when you can enforce attestation, some Vanguard employee with a Yubico Certified Security Engineer certificate (I made that up) decides to restrict us to Yubico's keys.

On the other hand, FIDO aims to also support enterprise usage, and I think it'd be a nonstarter to say that enterprises shouldn't be able to restrict you to officially issued authenticators—and without remote attestation, they'd have to do more cumbersome steps like handling enrollment centrally (and then shipping enrolled authenticators to users), which, in a distributed workforce, sucks.

So attestation fills a need, and I think we just have to hope it's not widely used or abused.

Is Cloudflare's usage an abuse? Eh, I don't know. It's an interesting experiment.


> FIDO aims to also support enterprise usage

It seems to me that this use case is completely at odds with the general use cases we expect when people browse the web.

In the enterprise, society accepts the idea that someone's employer knows their name, their address, their bank details, and even everything they browse on their company-issued computer. Whether that's good or bad, it's not a model that FIDO should be trying to support.

This reminds me of the controversy around TLS 1.3 allowing forward secrecy.[0] If companies want to make their own alternative to FIDO which has DRM and user tracking, to control access to their intranets, that's fine, but we shouldn't have allowed FIDO to become a Trojan Horse (or Trojan Dog?) to sneak this anti-feature into the general web.

[0] https://www.eff.org/deeplinks/2019/02/ets-isnt-tls-and-you-s...


Let me give you a little more of a fleshed-out use-case, then, as well as some alternative designs that come to mind.

Let's imagine I'm a consumer bank. I want the following features:

* I ship users a FIDO key when they sign up for an account.

* Users can register additional keys they bought or already have, and can use my key elsewhere.

* I consider malware on the user's computer to be in the scope of my threat model, and so I want all risky transactions (like transferring money) to come with a secure test of user presence.

I believe that the above three constraints are sufficient to motivate the current design. Why?

Well, let's say we do the obvious and just eliminate attestation. If the design otherwise remains unchanged, there's nothing to prevent clever malware from piggybacking a legitimate user presence tap to add a malicious, software-only authenticator to the user's account, which can then approve future transactions.[1]

Currently, however, I can simply require that authenticators all be non-self-signed, restricting to those that come with real hardware-backed security.

So now I have a few options. I can:

a) Disallow enrolling new authenticators. I think this is the worst option, because it means that the FIDO ecosystem would basically revert to (at worst) "an authenticator per relying party", which is expressly something it aims to avoid. (Think, keyrings full of USB fobs!)

b) Declare malware out of scope. I find this unsatisfying, since it suggests we might as well just replace FIDO with USB keys with X509 client certs, but see my footnote [1]--maybe at the limit these are equivalent?

c) Do what the FIDO Alliance seem to have done: create batch attestation, but discourage RPs from using it spuriously.

I think (c) is fairly reasonable, but as I said, I do find the entire malware argument sort of unsatisfying in a way, so, shrug.

[1] Arguably, if this is an attack you are concerned about, malware can also just piggyback presence to directly trigger transfers and other abuse, of course, so maybe this argument is a weak one? But I think an RP might reasonably want to say that "malware can do anything malicious within 10 seconds of a tap on your key, but if you think someone might have hacked your account, just remove your key and call customer support." If malware can add new software-only authenticators, it violates this functionality.


> restricting to those that come with real hardware-backed security.

And presumably RPs in practice achieve this by subscribing to a list of key updates managed by the FIDO Alliance themselves. That seems like it puts a lot of control over the web into the hands of a group whose incentives may not be the same as those of the average web user.

As a point of comparison, my impression is that it's been quite difficult to get all Certificate Authorities to correctly follow the CA/Browser Forum rules, and to tighten those rules. That forum at least tries to balance the wishes of User Agents against those of the entities selling (access to) the keys, which I'm not sure if FIDO will achieve.

> malware can do anything malicious within 10 seconds of a tap on your key

If this is the security situation that FIDO aims to create, then I'm not sure if requiring a whitelisted hardware device adds anything, relative to allowing self-signed software-only authenticators, other than security theatre and a single point of failure (namely the whitelist itself).


> And presumably RPs in practice achieve this by subscribing to a list of key updates managed by the FIDO Alliance themselves. That seems like it puts a lot of control over the web into the hands of a group whose incentives may not be the same as those of the average web user.

I'm not aware of the FIDO Alliance doing this today. I guess anyone could create such a list, but while I think your concern is valid, it has not (yet) played out that way, so I don't think you can claim it's inevitable.

Anyway, I laid out what I believe are the reasonable alternatives. I think you're taking the position that local malware is already sufficiently powerful that it's not meaningful to restrict adding new authenticators?

I entertained this as well, and I think there's validity to it--but I do think it's reasonable for an RP to say, "If you unplug your FIDO key, the attacker is thwarted", which is not something that remains true if you allow self-signed authenticators.


> Are we assuming that it's impossible to extract a private key from one of these devices?

Nope, it's just "expensive" to extract keys from secure hardware like this. The problem with an approach like this is that when the secret keys are identical for a large number of devices then the cost of revoking a compromised key goes up significantly which, for a spammer, would increase the value of obtaining said key because of the likelyhood that the key would be usable for a much longer time than if the key was unique to each device (and could be very easily blacklisted).

Techniques to extract data from these sorts of secure devices include various forms of side-channel analysis, decapping and microprobing the IC, using SEM, etc. to physically damage parts of the circuit to try to force it to disclose the key and various forms of power and clock glitching.

Most decent hardware-based cryptosystems are designed to ensure that each device has a unique key so that the cost of extracting one key (lets say around $100k) is too high for a potential attacker if the key can just be blacklisted, but if the key is expensive/impossible to blacklist then that cost might be worthwhile to an attacker.


> Where am I going wrong? I feel like I'm misunderstanding something fundamental about how signing works on these devices, but I can't figure out what it is.

Whilst Fast Identity Online (FIDO) is much more than WebAuthn, Cloudflare's proposal here is to use WebAuthn to get rid of CAPTCHAs. The official WebAuthn doc is surprisingly accessible with neat illustrations for key topics: https://w3c.github.io/webauthn/ (ref registration and authentication ceremonies, in particular)


Re uniqueness - given how FIDO2 tends to work, I'd also expect it to be RP ID (ie domain) scoped. Possible that's not true for the attestation/registration side though?


There's always CAPTCHA bypasses if you're willing to pay, there've been sites operating for decades that will take a captcha URL and spit out the appropriate response by just feeding it to humans. This is just a different way to make you pay - and arguably to something of less ill-repute, buying more U2F keys once yours get banned.

This provides effective rate limiting and you can still get every key you automate banned very easily.


If I understand it correctly, you cannot ban attestation key without potentially banning lots of legitimate users.


Second this.

For a major sporting event, one of our sites was heavily targeted by “free TV streaming services” self promoting their stuff.

No amount of Google CAPTCHA or Cloudflare could stop it while keeping it online. Never seen anything like it in my life.


That makes me so frustrated.

I HATE CAPTCHA's with a passion. They are everywhere and constantly slow me down. And you mentioned, they are likely not helpful in stopping bots.


I almost never see a captcha. On a static fiber IP - 1GB. Use chrome. Not sure if that matters.


> Use chrome.

That's why. Try using Firefox without being logged in to Google and with an ad blocker - you really won't like it.


I’m using Firefox, and are often doing stuff in Private Mode, where I’m not logged into Google, and am using uBlock, and Firefox’s strict tracking protection. CloudFlare and Google doesn’t serve me any CAPTCHAs.


this is what the blog post fails to address. Is the FIDO2 hardware key approach abusable? Yes. More so than regular image based CAPTCHA? No, like you said, there are well established services for "mechanical-turking" away the problem.


I’d rather take these tradeoffs than doing 5 steps of Recaptcha because I’m using a VPN to work, which as Cloudflares announcement said, is very localized to North America and likely extra complicated for those outside the region.

In theory, couldn’t Yubikey begin reducing batch sizes to 1,000 and Cloudflare mark specific batch numbers as requiring one extra step to verify? The vast majority of Yubikey sales will be for real people in any case.


The batch size requirement is imposed by the FIDO spec, to ensure that batch IDs are not so high entropy as to pose a privacy problem.

"In this Full Basic Attestation model, a large number of authenticators must share the same Attestation certificate and Attestation Private Key in order to provide non-linkability (see Protocol Core Design Considerations). Authenticators can only be identified on a production batch level or an AAID level by their Attestation Certificate, and not individually. A large number of authenticators sharing the same Attestation Certificate provides better privacy, but also makes the related private key a more attractive attack target."


Wouldn't reducing batch sizes make privacy even more of a problem? Now instead of a 1/100000 chance of the user being the same person on another website there would be a 1/1000 chance.


Yeah, except that because the other 999 users probably wouldn't be using Tor to access the same websites, in practice this would be pretty much guaranteed to give a highly accurate, persistent tracking identifier.


And if Yubikey could reduce batch sizes - could they require bulk non-wholesale orders to retain the same batch ID to reduce likelihood of abuse?


This won't work because guess what? Bad actors have money and means to buy as many devices as needed through individuals.

Most of the abuse that CloudFlare protects from is also usually illegal. And those taking risk of doing something illegal usually do it because it's highly profitable so making some authentification devices a bit more expensive won't make any difference.


I hate this new web where you're automatically assumed to be some malicious actor only because you don't accept cookies and strange third party code and then have to jump through hoops to show that you're not some evil bot. To be honest, if a website immediately throws some Cloudflare anti-DDoS thing in my face I don't even bother anymore.


We all do. Everybody who's ever worked in security hates that even the tiniest hole in your security will be squeezed through. If you don't distrust every single packet, then sooner or later one of those packets is going to destroy you.

It's basically the same both ways. You don't trust them with your private info. They don't trust you, either. The easiest way is, indeed, to just call the whole thing off.

Everybody would love an alternative that lets more get done with less trust. Sometimes they find them, for limited cases. But nobody's solved it for the general case.


I agree. Browsing the web can be super frustrating when there is a CAPTCHA every other page.


Even if we ignore the technical reasons, for me CloudFlare's proposal fails at their "Associate a unique ID to your key" property, where they say CloudFlare could, but won't do it. If they implement this scheme they start normalising this approach. Once it gets to FB and Google implementation, their answer will be: we could, but we... look! a squirrel!


Their document says, correctly, that the means by which they could try to do this would be to shove the arbitrary random ID they get into a cookie.

You may have noticed that both Facebook and Google already use cookies. Did you know Hacker News has a cookie too?


Did you know Hacker News has a cookie too?

There's a difference between being logged in to an account on a single site, and letting Big Tech track you across most of the Internet.


WebAuthn isn't "letting Big Tech track you" which is why, as Cloudflare explains, they would need to store the ID in a cookie to remember what it is.

You might be thinking "Duh, they can ask the Security Key". Nope, the Security Key hasn't the faintest idea, remembering that ID is the Relying Party's job, the Security Key isn't going to remember.


I'm so fed up with reCAPTCHA. ~90% of the time it doesn't work on desktop Safari (I can see CORS errors in the console), so I have to use a different browser. Even Gumroad won't let me buy things due to this. It really feels like an anti-competitive "bug" (read feature), and is so annoying it's hard to not just give up and use Chrome.

I feel like I'm crazy – no one else complains. I've mentioned @GumRoad on twitter but nothing.


You may wanna check your computers for viruses. Getting captchas often is usually a sign your IP address has been up to some shady activity.


Thanks, this has crossed my mind. I guess the only way to know is to check network traffic. I update all my devices within a week of every release though, and run macOS/iOS. I don't think it's the cause, but you never know.

I think reCAPTCHA just flat out breaks any form on macOS Safari. I've had a few work, so I'm guessing they changed something and it requires special configuration by the website using reCAPTCHA for macOS Safari to work (but apparently no one does it). I can't even login to Gumroad for example. I know macOS market share is small, but damn, this has to be costing businesses money.

--------------

Okay I finally researched this. It looks like it's because 'strict-dynamic' isn't supported. https://bugs.webkit.org/show_bug.cgi?id=184031.

I can't imagine how many people have switched to Firefox/Chrome because of this issue. You basically can't register for anything, since reCAPTCHA is used practically everywhere.


Is CAPTCHA a necessity only in ad-sponsored web? Is there other compelling use-case for it?

Can we make CAPTCHA obsolete with decent micropayments solution, when you pay for every transaction with every website, just like we pay for every drop of water we use? Perhaps ISPs could handle it for us?


How do you know who to pay to?

Sure, you are paying for the every drop of water, but what if you really wanted to pay for water from a specific region, doesn't want water from another region, and trust that the water company does not keep a cut or rip off either of you?


Https with the origin?


I can't see that being very popular. Even if it doesn't actually cost you much in absolute terms, billing per page will make people a lot more reluctant to explore new content.


This is an often neglected benefit of "Unlimited" plans. It changes the feeling of consuming. You have already paid so you may as well enjoy instead of asking "Do I really want to pay for this?" at every use.

From a technical point of view it is possible. Assuming that the payments were mediated by some party that party could issues statements like "this user has used their monthly allowance but they would pay". Assuming that this provider is widely trusted websites may treat this as a "real user" and allow the visit. (This is roughly how https://coil.com/ works.) Of course there are negative implications such as making it very difficult for new providers to get started.

You can also imagine some type of smart-contract where the subscription fee is split at the end of the day or month amongst the visited sites. Upon visit the sites just get a token for one share. Of course this would need to be very carefully designed to prevent abuse. (Example one malicious client splitting their subscription across millions of pages)


Captchas, or some anti-bot software is still needed whenever we deal with credit cards, because we are still using the obsolete version where, if you get your hands on the numbers , you can charge any amount you want to whomever you want, instead of a model where your card digitally signs the payment request for the given amount and receiver, which would mean that any theft is pointless.

Anti-bot measures are also used to try to prevent password guessing on e.g the login site to gmail.

Finally, sometimes some places offer things like tickets that go very quickly, in which case having a bot reload the page means the tickets are likely to go to somebody owning a bot rather than a fan of the performer.

None of these cases are solved by payments, they are solved by client side certificates and, in the last case, by requiring the name of the people who are to use the ticket.


There is no perfect solution, but I'm in favor of anything that's a net improvement in accessibility for disabled people, even if it's not ideal in some other way. So I'm disappointed to see this solution being shot down before it even gets deployed on a large scale.


Right before large scale deployment might be the last moment it's possible to prevent the large scale deployment.

Unfortunately corporations are not good at going a step back if the step forward is good for their business.


The trouble is that this change could be good not just for Cloudflare's business, but for people. If it turns out that this new CAPTCHA alternative is an improvement for users, but hurts some businesses who have to put up with a new form of abuse, I think that's a net win. Let's not stop it before it has a chance.


I agree, but what if it's not and going a step back is then refused? Is there really no other way of testing than large scale deployment?


Cloudflare is the professional wall builder you hire to protect your garden.

Tech monopolies have always had a vested interest in locking up user data, dictating the policies, and enforcing their own ownership rights. It used to be that only the largest and most sophisticated companies had the resources to shield that data, but Cloudflare changed all that. Walls are now trivial to set up, and virtually unbreachable, and that has forever changed the character of the internet by enforcing monopolistic policies with such technical precision that they're virtually impossible to overcome.


No offense, this framing is so dumb. I hate it.

The ‘Internet 3.0’ isn’t coming because of Cloudflare. It’s coming because these monolith big tech companies have an army of engineers who have been centralizing and building it this way for years.

Cloudflare didn’t build these walls, it’s more of a giant boat now navigating it because other companies have no choice.

I like to think of them as giant data ferryman in this regard versus “a wall builder”.

I’m not saying frustrations aren’t warranted but — like come on — have a little perspective of what’s really happening with the Internet and who is actually driving it.


Clearly Cloudflare isn't responsible for the data centralization that is corrupting the internet. They are however, a very sophisticated and efficient enforcer of those policies. They've helped ensure large portions of the web is no longer crawlable, and that serves to consolidate information and power in those tech monopolies.


why is it assumed the web ought to be crawleable?


So that we can find things on it without prior knowledge.


You already get a lot of defaults from your vendor (OS, device, browser). Why couldn't these vendors maintain a bigger list of default entrypoints for the web? They already do for "apps".

Plus there is already an endless sea of walled stuff (paywall, "create an account and login" wall, "fill in your address/ZIP code and we'll give you the best price" wall, government filtering wall).

I'm not saying it's great, or it would be great, but currently having 1 (or maybe 1-and-half) search engines is also very far from ideal.


Aka, SMB's now have access to the same tools the tech monopolies do.

GDPR-like policies will continue to flood as governments partition their Internets and data making it harder and harder to run international Internet businesses.

I'm not particularly happy about things either (especially crawling access), but it will be a net positive whenever you can level the playing field with competition.

When the biggest infringers of data are driving the creation of government policies that only they can circumvent and navigate -- that's a serious, serious problem.


The article here ignores the view of the Web that Cloudflare has, which coupled with "something you have" (the U2F keys) makes for a compelling alternative to CAPTCHAs.

Sure, bots can automate keys, but those keys could also be banned just as well. Cloudflare only needs to know which ones are the good keys and track those forever. This means, for every non-bot out there, the CAPTCHAs are as good as gone.

The genius of Cloudflare here is that they (ab)use WebAuthn, which can also be implemented on Android and iOS natively. Before you know it, Cloudflare has built an identity platform (where it may not be helpful for KYC) is plenty useful for websites Cloudflare fronts. Imagine never having to bother with user registration and authentication and bots... that's the next extension I see to all of this.


I agree, I saw the author mention for $25k you could have 1000 keys. I immediately though that is not nearly enough. Given the sheer volume they have, they would start putting a picture together very quickly of ip/key/sites. There is no where near enough uniqueness.

I also thought the idea of the key exchange being fast was a red herring. That's a bad thing. If I'm them I'm paying attention to how long from prompt to exchange it takes a human to touch the button. On my own setup my key is on my laptop, which is in a dock. I must stand up and tap it over my monitor. It's just a few seconds but it's a) consistent in timing b) not < 1s. Imagine the aggregate timing data they have.

Overall they make some good points if you are teeny tiny player and ignore completely the scale cloudflare is operating at.


Each key is associated with a batch of devices, though. If you ban a key, you risk banning a bunch of legitimate users.

It's an interesting trade-off. It seems like batch keys for device attestation was designed to help protect individual privacy (good), but if you can't ban a key without potentially a lot of splash damage when you detect a bad actor, that seems like a very limiting choice.


The intent of attestation is that a business could decide, OK, we think FooCorp are doing a proper job and we trust their FIDO tokens, but we don't like all these dozens of cheap alternatives. So for our corporate site we'll require FooCorp tokens, and we'll just issue every employee a FooCorp token on our dime.

Maybe it could make sense for a bank to do this, sending account holders a special custom Security Key with the bank's branding on it. I personally think that's stupid, but I can imagine it appealing to bank executives and it's not so stupid as to be worse than SMS or TOTP 2FA that banks do today.

But it clearly isn't relevant for no-cost services like Facebook or Gmail, and so sure enough you can just tell them you don't want to give them attestation and they work anyway (I don't know if either of them ask, I just reflexively deny attestation if it's requested).

It isn't intended to be useful for trying to do stuff like Cloudflare are attempting here. Which doesn't mean Cloudflare can't succeed in their goals, but in the FIDO threat models a "bad actor" would be a whole vendor, maybe some outfit is using fixed long term secret keys inside their Security Key products and they just sell the NSA a list of those keys - you might decide to just refuse all the products from this vendor. Whereas for Cloudflare the "bad actor" they're worried about just buys a half dozen of whatever was cheapest from eBay and then plugs them into a Raspberry Pi.

Or, do they? That's the gamble I think Cloudflare is taking. Maybe the value of defeating this intervention is so low that bad guys will not, in fact, build a Raspberry Pi Security Key clicker proxy to make their thing work.


> Each key is associated with a batch of devices, though. If you ban a key, you risk banning a bunch of legitimate users.

You're right. I meant Cloudflare could ban the generated public-key and not the device's public-key itself. Besides, they could also mark the batch as being taken over by bots and increase the level on challenges issued to the batch. Note though, a single secure module can only generate / store so many public-keys. For instance, Yubi Key 5 supports up to 25 keys, though those could be reset to generate a newer set of 25, but repeat registration of a number of keys from a single batch is bound to trigger some statistical anomalies.

From Cloudflare's blog about Cryptographic attestation of personhood https://archive.is/4EbER

> For our challenge, we leverage the WebAuthn registration process. It has been designed to perform multiple authentications, which we do not have a use for. Therefore, we do assign the same constant value to the required username field. It protects users from deanonymization.

Currently, the user-name field is constant for all users. I wanted to point out that that they could amend the registration ceremony to register any user in particular.


> For instance, Yubi Key 5 supports up to 25 keys

This is for resident keys. A YubiKey 5 supports an infinite number of non-resident WebAuthn keys, because the returned key handle will simply be the private key encrypted with a master key stored on the YubiKey. For authentication the service will send the stored key handle back to the YubiKey which then can decrypt it and use the decrypted private key to sign the challenge.


TIL.

Envelope encryption. Neat. Can WebAuthn keys be (made) a resident key? If so, is that preferred instead?

Conversely, what use case is there for resident keys in context of WebAuthn? For example, if there are multiple master keys, can I switch between them per browser / website (assuming the master key itself is a resident key and not burnt into the element)? Thanks.


The WebAuthn API can register a resident key on the YubiKey. This will basically store the username, private key and domain on the YubiKey. The website then can later request authentication based off a resident key. This will cause your web browser to query the YubiKey for resident keys of the website. You then can select the resident key with the correct username and will be logged in based on strong cryptography without needing to enter your password or username. Depending on your YubiKey configuration you might need to enter your YubiKey pin for this to work. See the screenshot in this comment on a GitHub issue: https://github.com/keepassxreboot/keepassxc/issues/3560#issu...

The website will need to support this of course. Also the amount of storage available for resident keys on the YubiKey is limited.


They already have that but it's just for internal teams. I used it recently to lockdown Wordpress installations


It seems to me that forcing the user to go through captcha is a big negative user experience.

Google must be docking points from websites that employ captcha then, right?


It requires loading a JS library and that does dock them points for performance. So yes sites are penalized for it.


It doesn't matter. Googlebot is whitelisted.

(Partially sarcasm, I do know that Google does do some anti-cloaking crawling)


Once they have your ID info, they'll later change the terms to sell it to advertisers. Are they contractually committing to never doing that? No. So they will.


Even if they are contractually committed not to sell your info, that still might not save you:

“Yesterday, the bankruptcy court approved the sale over the objections of several parties, including the Federal Trade Commission (FTC) and third party manufacturers Apple and AT&T who sold products to the bankrupt retailers.

...

The FTC’s objection was made to the court-appointed consumer privacy ombudsman in the RadioShack bankruptcy. Specifically, the FTC’s letter alleged the sale of personal information constitutes a deceptive practice because in its privacy policy, RadioShack promised never to share the customer’s personal information with third parties.”

https://www.jdsupra.com/legalnews/radioshack-bankruptcy-cour...

In that case the judge allowed the sale of the information in contradiction to its commitments.


> In that case the judge allowed the sale of the information in contradiction to its commitments.

Note that bankruptcy always allows things in contradiction to commitments; bankruptcy is all about balancing which commitments will not be fulfilled, and by how much, when a party is no longer capable of fulfilling all of its commitments.

If you don’t like particular commitments being voided in bankruptcy, you want legislation specifically protecting them so that there is a clear legal barrier to voiding those specific kinds of obligations.


Cloudflare is both a great thing and a terrible thing that has happened to the internet in recent years.

Great in that they have a fantastic UI to add your site in, basically shielding the average user from attacks.

Bad from a standpoint of that now only Google, Bing, and maybe other big search engines have the capabilities to actually crawl the internet now.

I don't see us getting a massive innovation in search on the internet now that Google has such a massive foothold, and companies like Cloudflare stop innovation from happening.


I don't see us getting a massive innovation in search on the internet now that Google has such a massive foothold, and companies like Cloudflare stop innovation from happening.

How are we "stopping search innovation"?


Hi!

Thanks for taking the time to reply.

You mentioned about "legit" crawlers, what defines a "legit" crawler in the eyes of Cloudflare, and what happens when Cloudflare suddenly decides it does not want to honour that "agreement"? What happens if/when Cloudflare is sold, or the contact who greenlit these smaller "legit" crawlers moves on and decides that it no longer agrees with said website anymore?

Is a price comparison site a "legit" crawler? What defines a "bot" vs a "crawler" in the eyes of Cloudflare?

Would you need to notify your customers that you now also allow additional crawlers access to their sites, or would they need to opt into it via the Cloudflare dashboard? What happens when you have a falling out with my said company (it happens, relationships sour) and suddenly we can't make contact, then suddenly customers websites aren't being scraped because we're bots?


There are a lot of hypotheticals here. I think you'll convince CloudFlare, and their customers, if you could name names and mention specific examples?

If you are a price comparison site getting blocked by cloudflare, site owners may be losing sales, and that's good feedback.


Or site owners may actively want to block a price comparison site..

Depending on the industry, etc..


Agreed on this point, and most companies who would want to do these sorts of things would restrict it even if Cloudflare didn't exist.

I really can't think of a good solution. But that's the tricky position Cloudflare is in - how does it balance everything.


Price comparison example: https://shucks.top/ sometimes gets blocked by cloudflare. Most recent was getting blocked from checking B&H.


Crawling a Cloudflare powered website is basically impossible without needing to do some bodges as to how to crawl it.

How can you expect someone to crawl a bunch of websites if they are actively blocked from accessing it? Now, you might say users can whitelist bots in their robots.txt file but then again will the person creating the engine individually ask companies to allow them to crawl?

Also, slightly unrelated but Cloudflare protected websites are almost impossible to access via tor, the captcha never succeeds.


> Also, slightly unrelated but Cloudflare protected websites are almost impossible to access via tor, the captcha never succeeds.

Yes, I've never understood why it's seemingly so important to CAPTCHA me before serving me less than 100kb of read only plain jane HTML. What sort of "attack" is this stopping? I'm pretty sure the CAPTCHA itself is bigger than half the sites it blocks me from reading.


If you are building a search engine and getting blocked you can always contact me and I'll make sure that the teams that work on bot detection and DDoS are aware. We would like to know because we should not be blocking a legit crawler like this.


What makes a web crawler "legit?"

When I had a site that had millions of pages, I found that sites like Baidu would crawl my site as often, if not more often than Google.

I already felt the relationship with Google was parasitic, but I looked through my logs and never found a single hit that came from Baidu and many of the other search engines that would overload my site.

I was looking at a substantial part of the site running costs going to supporting web crawlers that were not doing anything (1) to help me, or (2) to help end users (if they don't want to send Chinese users to an English-speaking web site, why crawl the site?)

So like it or not I am inclined to only allow Google and Bing in the robots.txt because Google is the only site that sends a significant amount of traffic and because Bing sends some, and Google needs some competition.

There are web crawler behaviors that are annoying: harvesting email addresses, overloading your site, etc. But how do you know who is doing something wrong with the data and who is just collecting it do do nothing with it? (Probably 95% of web crawling ex. Google.)


> So like it or not I am inclined to only allow Google and Bing in the robots.txt because Google is the only site that sends a significant amount of traffic and because Bing sends some, and Google needs some competition.

This sounds like you're onto a reasonable “legit” factor: does the crawler honor robots.txt? Baidu would be legit because they don't lie about their identity and if you put a rule in your robots.txt file they'll honor it.


For every developer that sees this message, a few dozen will have given up.


Exactly this. Cloudflare actively blocks legit crawlers. It shouldn't be dependent on seeing some random hn comment from some random at cloudflare to get that fixed.


Say I’m interested in building a small scale domain-specific search engine and only just started development. There’s no prototype yet and may never be. In this situation, how do you determine it’s a legit crawler?

And what about crawlers with even more limited scopes (targeting only a handful of sites) that they can’t possibly be called search engines? Are they ever considered legit?


Be a good netizen? Respect robots.txt. Don't lie in your User-Agent. Don't crawl at a ridiculous rate. All those are a good starting point.


I think the problem is some IPs just straight-up always get CAPTCHAs from Cloudflare even if one’s a good netizen, respect robots.txt, not crawl at ridiculous rate, and not lie in the user agent. One reason is shared IP, which disproportionally affects people from third world countries as their ISPs don’t have enough IPv4 for everyone; but it also happened mysteriously to at least one dedicated IP I used in the past. Your confrontational tone is rather unfortunate, and the problem of course is that you don’t guarantee anything even if the user has done nothing wrong, as is manifest from the choice of the phrase “starting point”.


Then the problem has nothing to do with your crawling.


Do you not realize the more fundamental problem with you, as a company, essentially being the one who gatekeeps crawler access to the web?


I use cloudflare out of my free will because there's malicious traffic out there, and I have enough control over everything.

They're only a gatekeeper because sites voluntarily enter into commercial agreements with them. There's no coercion or manipulation like Google AMP.


Customers pay for this as a feature. Why would they feel it's a fundamental problem? There's nothing that says admins need to let you crawl their site.


If people intend this to happen, sure . But how many people who put their sites behind Cloudflare is aware that this might be a side effect?


I would wager that most people that purchase Cloudflare are probably aware of the features it offers


Note the distinction. I'd wager that the vast majority of sites behind Cloudflare are not paying customers, and have not paid much attention beyond "hides my server IP slightly and stops DDOS's", without having thought more - or at all - about the wider implications.


This is exactly the case.


Even then, I've run into issues with scraping several sites for the reverse image search engine I operate. Luckily, in most cases I have been able to get in touch with the people running those sites to get a rule added for my IPs to allow them through. That's not scalable though, and limits where I can scrape/crawl from. Even something as simple as checking a site for updates every hour or two tends to get blocked after a few times. TBH, one of the only things I have found which helps is lying in the user agent and copying CF cookies. Luckily, I haven't had to play with that for a few months due to whitelisting, so not sure it it would still help. Things change rapidly.


What is a “ridiculous rate”? Where is it documented?


What's the best way to contact you?


jgc @ cloudflare


By being gatekeepers on which website crawling is okay and which is not.

No such filters should exist. Is it really that awfully bad without anything but basic filters (ban an IP for flooding)? Are there like, operations that try and spam every single Cloudflare-hosted website 24/7?

Legitimately curious if your anti-bot measures come from actual bad experience with the internet or is it just a liability limitation move? (Namely to reduce potential suing surface by angry data owners and/or three-letter agencies.)

---

Basically, if I am experimenting with a basic crawling program and I hit websites A and B 20 times each in a space of one hour, is that really deserving of a captcha or extra auth methods?

Not flaming but I am really curious. Do you have any data and rationale posted somewhere that go into deeper detail about why Cloudflare's bot detection is how it is?


Cloudflare's customers request and then enable those features. Cloudflare itself doesn't give a damn about that traffic; they have bandwidth to spare. They will, however, happily sell tools to people that do care.

That isn't to say that the customers are savvy and have a good understanding of different types of automated traffic and which automated traffic is harmful and which is benign. Many have a quite naive understanding that doesn't extend beyond "bots = bad, unless it's Google" and dial protection settings to the max for no good reason.


For instance there is no way for distributed search engines to work with CloudFlare. No, "contact me and we'll help" is not always a solution.


Please explain the problem (here or via email to me).


It is great that you care and I guess others are already provided some examples, but I'll add my own 2c here. Obvious problem is that centralized service like CloudFlare do create entry barrier and make large players on search and data mining markets even more entrenched than ever.

Recently your company announced partnership with Internet Archive, but if CloudFlare want to continue play a role as behevolent party everyone should have equal access to this data. Yeah it means that some bad actors will be able to easily scrap the web too, but...

CloudFlare service can't prevent scrapping anyway. There are shady residential proxy networks, services to bypass captcha and scrapping software like Zennoposter. It's possible to make scrapping more expensive, but bad actors don't care because they have money. Unfortunately enthusiasts, open source projects and small companies don't have enough resources to do the same.


Making scraping harder definitely reduces scraping. Some bad actors will get through, but others will be deterred.

I think you might not understand that it's site owners like me who want to stop scraping. It usually comes from specific bad incidents, like copycat sites stealing our content and work.

Cloudflare wouldn't block scraping if website owners didn't want it. And website owners can easily disable this protection.


Scrapping protection is not a problem: defaults that CloudFlare promote are. Saying that website owners can disable it is akin saying website owners should go and whitelist Tor nodes. Most of website owners don't understand either of issues and they never gonna opt-out.

Also I'm talking from experience because I been on both sides of the fence: doing scrapping and implementing protection. So yeah your CloudFlare protection will deter 10% of bad actors, but will also cut off 99% of enthusiast / research efforts or users of niche software or browsers. Still anyone with $1000+ budget will scrap whatever they want.


I’m really sorry, but you appear to be the CTO of Cloudflare, which makes your not knowing the ins and outs of the problem already and basic questioning of it seem like sealioning.[1]

[1] https://en.m.wikipedia.org/wiki/Sealioning


I do not understand what the parent means by a "distributed search engine" and I do not know what problem they are facing.


https://yacy.net/ for example. Each interested node does indexing and serving some chunk of the results.

Or in practice - each node quickly runs into a CloudFlare captcha preventing it from indexing content for a few hours/days. Since CF fronts a lot of the useful internet these days, it means it's effectively working against distributed indexing with its current captcha solution.


Thanks. I'll bring this to the attention of the bots and DDoS teams.


Yacy is 18 years old and not exactly obscure. If your bots team is unaware of it it's because they've chosen to be ignorant of it.


A search engine which is not run centrally by one organization on infrastructure in a known network, but rather something like YaCy where individual users run crawler nodes on networks that vary over time.

Which makes "contact us for an exception" a no-go, as the relevant source IPs will constantly be changing.


Yeah, I certainly don't want those crawlers anywhere near my servers. Block by default and allow site admins to unblock should they want to seems like the best way. It is also already how it works with Cloudflare.


>No, "contact me and we'll help" is not always a solution. reply

In fact it's a textbook definition of a non-scaling solution.


That response is just a way to move the discussion out of the public domain without actually addressing it. It’s a scam.


jgrahamc is one of the most active HN users. He's in the top 20 on the "leaderboard". He has a good reputation. I'd hesitate to call this offer a scam.

That said, I don't think it's a good situation that this is the solution rather than a proper, documented position that people can work to.


Calling CTO of a big company who come to talk with us "scam" is very counter-productive. Some of CloudFlare bad sides are certainly by design and cannot be changed, but they can still change their default filtering policies in a way that will help open web greatly.


I've never been able to "reach a human" at Google, Facebook and other web giants and I'm skeptical that you can at a place like Cloudflare. In fact, I'd be really astonished it was possible, because otherwise their business isn't scalable.


I personally love CloudFlare and (just like with e.g DigitalOcean) I always found a way to contact human there. Unfortunately it's doesn't fix fundamental issue of how they make internet much more centralized and easy to MiTM or apply censorship.


The grandparent, jgrahamc, you're responding to is the CTO of Cloudflare.

If this doesn't at-least meet your definition of "reach a human at Cloudflare", I'm not sure what will.


while I share your sentiments regarding some other companies, I was able to get in touch with an actual cloudflare technician (and not some outsourced first level support with standard boilerplate replies) in a timely manner even on their free tier when I ran into a problem with one of their system. every support case with them has do far been a real pleasure compared to what you experience with other companies. I only hope they will be able to keep this level...


Cloudflare support has been exceptional to me as a website owner.


Did you contact anyone at Cloudflare for an issue not explained in the support docs and you got no response?


I trust that cloudflare will act responsibly in allowing small search engines through, but I really, really would rather not have to trust cloudflare. I don't believe that any organization can or will always act responsibly, which is why it's concerning that cloudflare controls so much of the internet.


Yes. This. I believe that John Graham-Cumming is genuine in his statements in this thread re: "contact me if you're running afoul of our controls", for example. If he leaves Cloudflare, Cloudflare "turns evil", etc, then that's all out the window.

Individual companies having so much power gives me the willies.


How does my startup crawl Cloudflare sites without paying a hefty fee to Cloudflare?



This will scale wonderfully!


No, what scales is us making our DDoS and bot detection not disrupt the crawling of legit search engines that respect robots.txt, don't crawl at ridiculous speeds, don't do dumb stuff like pretend they are the Googlebot. We have teams who work on that. You can read more here: https://blog.cloudflare.com/tag/bots/

But let's suppose someone is building a new cool search engine and our ML stuff is blocking them. Then... contact us/me.


So for my startup to crawl sites I must now adhere to Cloudflare’s Requirements of the Web(TM) or reach out to individual engineer, who may leave at any moment. Gotcha

(but Google is allowed because Google was first to market)


Why would you possibly think you can do whatever you want to someone else's site?

Yes, you must adhere to the controls that site administrators put in place, like Cloudflare.... You don't get to blast my site with requests, just because you want to...


(a) Who said I was blasting your site with requests? Cloudflare stops much more than just blasts

(b) But you’re a-ok with Google doing this. Gated communities aren’t really good for anybody but I see what you are saying.


Gated communities are great. They lower the risk of crime significantly: https://www.sciencedaily.com/releases/2013/03/130320115113.h...

The same is true online. Apple's walled garden has kept hundreds of millions of people safe on their device. It's why iOS malware isn't a thing.

> Cloudflare stops much more than just blasts

Exactly. There's even more benefit to Cloudflare than just DDoS. Captcha's for stopping credential stuffing, for example.


..Didn’t realize my startup search engine stuffed credentials :(

But hey if I pay Cloudflare enough, then I’ll get to blast your site and possibly stuff creds at the same time :/


That doesn't sound unreasonable. Out of interest, what would you consider a ridiculous speed to be crawling at?


I can't speak for Cloudflare, but crawling speed should be dictated by the site owner via the robots.txt crawl-delay. [1] A site owner could also rate-limit unauthenticated requests by IP via the cloudflare header using a 429 too many requests error page.

[1] - https://en.wikipedia.org/wiki/Robots_exclusion_standard#Craw...


This here is the problem. It’s a new time no one wants to be Rfc compliant, just go behind a service and problem is solved.

So no problem, time to move on web search is no longer exciting


It seems to be by design.


...also Cloudflare has been a disaster for Tor. It really harms Tor's usability.


Maybe I'm cynical, but I don't see any innovation to be done in search. Google results have become much less useful over the past few years. If they cannot solve search with basically unlimited resources, how is a tiny company going to?

1. Filtering ever increasing trillions of spam/clickbait pages

2. Figuring out which results are useful information vs corporates trying to sell something.

Those problems are not solveable by a couple guys in a garage


> If they cannot solve search with basically unlimited resources, how is a tiny company going to?

Reminder: Google Is An Ad Company. Are you sure they actually want to solve search? Their primary interest is to be that corporation selling you something.


It's not clear to me that Google still gives a fuck about solving search/organizing the world's information and make it useful. The mess they've incentivized the web to become is very profitable for them.


Maybe not, but the garage guys should look into becoming the best search engine in a niche and expand from there.


There was once cs professor who claimed that the internet does not scale around 2000. Either things choke up or you need huge investments into networks.

It turns out that he was right, sort of. The vanilla attach server into the internet, server-to-client IP-network is pretty much dead. It has been replaced with CDN's , private delivery networks, cache on top of cache. Cloudfare, Amazon, Google and MS are the connection points for the IP. Their internal network infrastructure transfer most of the data.


Is it pretty much dead? Yeah, if you’re moving FAANG level traffic you need something more fancy than LAMP + an internet connection, but I’ve seen dozens and dozens of sites with a plain old no-cdn, no-pdn, LAMP tech stack. Working with startups might bias your view - lots of companies are running extremely boring setups and they work just great.


Respectfully, I think you miss the point.

The fact that 98% of traffic goes trough this new infrastructure, allows some still plug their server to the net raw and their traffic still gets trough.


The traffic goes through this "new infrastructure" not out of necessity but because it's free and allows the user to pretend like a number of problems don't exist.

Terrestrial optical networks operate far, far below capacity to create artificial scarcity, which is to a certain extent necessary to recoup the capital expenditures in a competitive market, and is to a certain extent an abuse of an under-regulated natural monopoly.

If you could eliminate all adversarial factors and put every data service subscriber's payment for a single month into a pool, and that pool purchased only transceivers, passive optics and switches, and this hardware was distributed to every network operator perfectly fairly based on its contribution to the global maximization of available network capacity, then the delivered capacity to the end user could increase by something like 4.5 orders of magnitude with no substantial change in topology or subscriber or provider costs afterwards using the existing fiber, with a few more orders of magnitude possible with a fatter tree before the backbone costs explode.

With DWDM you can carry 100+ channels of 100Gbps over a single fiber today, with commodity, off the shelf components. Most fiber in the ground today is probably still lit with a single wave of 10G, if it's not just dark.

This distribution model is not even remotely a technical necessity, it's an arbitrary local minima reached largely by exploitative market distortions and adversarial economics.


>The fact that 98% of traffic goes trough this new infrastructure

Doesn't mean that 98% of the value of the Internet results from this traffic.

Even if you discount all of the web, there still lots of applications using the federated model (e.g. SMTP) or peer-to-peer (e.g. Crypto, VOIP), that require end-to-end connectivity.


You expand this discussion into completely new direction.

My original point is about capacity and if the old internet could work today. It seems like I'm correct, but I'm not so sure. I would like to see other opinions.

Every response so far is "there exists". The real issue is if the internet could do everything without caching data near the edge.


No. The marketing of all that extra crap has gotten better.

It's just like Windows - just because 95% of the Internet does something one way doesn't mean it doesn't suck, isn't more complicated than it needs to be, and doesn't cost more than it needs to cost.

Anyone with a little bit of bandwidth and a Raspberry Pi can run a web server, even with dynamic content.


This feature is enabled at the behest of the site owner. I feel like site owners and operators should decide who gets to visit their site. Am I missing something obvious here?


Is this post missing the point, or am I? I thought that the point of requiring attestation is to prove that someone actually did go out and buy a legitimate Yubikey (or whatnot) and ban that key if they're spamming.

With those two considerations, this actually seems like a really good idea to me.


It doesn't appear to identify any specific key, just that the user has a yubikey. You could only ban a whole key manufacturer (or key batch, however large that is).


This seems to assume that existing captchas are much better than they actually are.


How does this so called 'CAPTCHA replacement' idea compare to Sign In With Apple? which also does not use any CAPTCHAs and aims to prevent bot sign ups.


Apple Sign-In is just an OpenID federated login; these don't inherently provide any anti-automation or rate limiting; they just push the problem to the Identity Provider.

IdPs like Apple/Google/Microsoft might do a fine job of limiting you to "one account per $unit-of-hardware"; Apple in particular can do this via iOS attestation. But then you're limited to either their heuristics (in the case of MSFT/Google) or their hardware (in the case of Apple).

Ultimately this is apples-to-oranges, though, since Cloudflare is not offering an IdP product but simply an anti-automation solution. If you use federated auth, you're getting (and giving up) a lot of other stuff beyond just anti-automation.


Maybe. I would have to read the FIDO group’s discussions on this to be confident in my opinion. :)

I personally enjoy the ability to use my employer-provided authenticators on my personal accounts. I think this is both convenient (only one thing to carry) and helps adoption (I got a key “for free”). So maybe enterprise use is a Trojan Horse to promote consumer adoption?


Hmm. I replied to the wrong post.


Could this be solved (in large part) if key makers like YubiKey did I.D. verification on purchase? Then, to do the type of "farming" that's mentioned in this article, you'd need to organize a large group of people to all buy the keys rather than just submit a bulk order to Alibaba.

Of course this idea raises privacy and authority concerns, similar to certificate authorities.


> key makers like YubiKey did I.D. verification on purchase?

I think that would require that every key maker would have staff in every city in the world who were trained to inspect your identity documents, check for forgeries, and not be susceptible to bribes or coercion.

Or at the very least every city would have to contain at least one location where someone from some organisation (possibly the government) could carry out this process.

The "authority" concern would require that the verifying organisation could be blacklisted if they started giving out too many IDs to the wrong people, or refusing to give IDs to the right people. (Perhaps you've played the game "Papers, Please".)

It's the "privacy" concern that worries me more. What happens if someone tries to buy a second key? Presumably there is a limit to how many each person can buy, so if someone says they lost their previous key, the issuer needs a way of revoking it. But how do they do that without creating a valuable list somewhere of which key belongs to which real world ID?

To make matters worse, because people can move between cities and jurisdictions, this database of online IDs would need to be globally shared between all verifiers, otherwise you could just buy multiple IDs from multiple vendors. That means the database that ties your browsing history to your legal identity will be accessible to basically everyone in the world, because the access to that database will only be as secure as its weakest link.

If we're going down this route, we might as well put everyone's identity onto a blockchain and let people vouch for each other in order to establish trust/reputation. In fact, that sort of system has actually been implemented; it's called BrightID, and it "requires no personal information, letting you prove your humanness without risking your privacy".

https://www.brightid.org/


Can a FIDO key be implemented in software? Can you write a program to register a FIDO key as a multi-factor authentication device with a Google account?

Or is there some repository of all allowed devices with identifiers? Intuitively that'd be the only way to prevent infinite virtual devices..


Attestation (what they use) is orthogonal to authentication. Token manufacturers have per-batch keys, private key being in the devices of that batch, so sites can verify that your device is from that batch of that vendor. You "can" implement attestation with your own key in software or in whatever, but Cloudflare won't trust your key :D


So how does cloudflare know to not trust your key? They know some secrets from the token manufacturers and test your response against them?

If so, what if the secrets get out? Then all the keys in those batches are poisoned?

Isn't this just some sort of side channel certificate authority?


Token manufacturers are the CAs for their own tokens, Cloudflare would know these CA public keys.



And if you wanted to read more in depth about attestations here is our latest article https://news.ycombinator.com/item?id=27279482


A very important addition to the article is that you don't have to have Ryzen for this, just a motherboard that has a good chipset for handling 128 or 256 USB ids simultaneously. I speak from experience.


There are a lot services in Sweden that requires that you provide a real authentication by using something called BankId. Basically a personal digital id. This is the way to go. 100% secure validated users. If there was a function added to make the users anonymous to third party services it would be great.

I work with Cloudflare sites and it is clear that thier current enterprise offerings are hard to tweak to solve attacks without spamming the users with captchas. The captchas are already too complicated for the average user so it is mostly turned off even though it has other consequences. I have to look into this new thing though.

As much as I like the idea of an open Internet to be used by anyone from anywhere. It does simply not work today for a lot of enterprises.


Ah yeah, instantly zeroing anonimity for most users, while it can be still abused by malicious actors, a great worst of both worlds solution.


Not sure how you came to that conclusion from what I said. What I am saying is that in the long run it will be impossible to rely on companies like Cloudflare for security when it comes to users. Over time all services will for security reasons need to either directly or indirectly authenticate all their users. That does not mean that each users identity is provided to each consumer.

The open Internet is already dead thanks to Cloudflare, Akamai etc. A lot of European companies use theses services to block China, Russia, TOR, VPN-services and so on.


It just alluded to an internet strictly divided between "verified" single users (persons, regardless if per service) and everyone else (assumed as malicious by default) as a total and unavoidable path.

I think the nature of the internet necessitates temporary, limited gatekeepers like CloudFlare, but as long as the legal status and technology keeps maintaining moving targets (like how you can fake your way out with a dedicated ip vpn, or just remain within tor network) this isn't too much of an issue. Anonimity+trust will never be a fully solved problem, and as long as a comfortable level is achievable between willing parties, the status quo remains.


I mean there are people making $1 for 1000s of recaptcha solved. So not sure how a $40 device is not an improvement if your goal is to enforce some rate-limiting against scripts using these services.


See also the original announcement: https://news.ycombinator.com/item?id=27141593


Privacy seems like a bad argument considering CF already has the technical ability to easily track you across all of the sites they front if they so desire.


With each domain u2f generates a different key (conceptually) so this should be harder to track potentiality.


The attestation key is the same for each website the device logs in to.


Reading CF's blog announcement [1], this is really horrifying. It trains users to insert security keys and accept biometric identification requests when visiting random web pages, on random untrusted domains.

This cannot possibly end well.

[1]: https://blog.cloudflare.com/introducing-cryptographic-attest...


Isn't part of the point is that a phishing site wouldnt get the same response as a legit site, and therefore it's be useless to do that, so this behavior is ok?


It's not about phishing, it's about getting the user to blindly accept security checks. If users are trained to insert their usb key / scan their fingerprint whenever they see a cloudflare page, bad actors can present a mockup of this page to exploit that reaction.

Physical keys (and biometrics) work well because they are rarely called for, and the user knows they are doing something security sensitive. "This random page asked me to insert my security key" can't be healthy.


No. Ignorance is a big problem. What makes Security Keys work well isn't that they are "rarely called for" but almost the opposite, they're so easy that you can add them with little friction all over the place. Tapping to sign into a remote server over SSH is no problem, it's scarcely more effort than thumping "enter" on the command is.

What the user is doing is not security sensitive. They are, in fact, themselves, and that's all the Security Key is confirming. "Yup, still me".

One of the easy ways fools trip themselves up here is that they think this is identifying information. But it isn't. "Yup, still me" doesn't identify anyone. The identity was already known to your interlocutor, which is why "Yup, still me" is enough.

And that's what's so clever about the FIDO design. A Security Key has no idea who it "is", it just knows it's still the same as before. If you're already authenticated as Jim Smith, you can enroll a security key "Yup, still me" -> the Relying Party stores the information, and then you can later sign in using it to verify your identity, "I'm Jim Smith". "Is this still you Jim Smith?" "Yup, still me".

So that's why this doesn't help bad guys. "Are you still er... you?" "Yup, still me". Completely useless. Of course you are, that doesn't help them at all.


But wouldn't the response the mockup gets only work for that page, not something they could pass through?

If any page can request any other pages response that'd make the whole system pointless


Do you think it won't become normal for people to present a fingerprint or a face scan in order to buy and sell things online? We already have a similar process for people visiting businesses during a pandemic.



What am I missing? I get this device and can crawl all I want?


So visiting cloudflare sites with TOR requires you to identify yourself? That's not great.


Visiting many CloudFlare sites with Tor was impossible the last time I checked because their CAPTCHA is broken and has been for a long time.

I know for a fact that there are staff at CloudFlare who are aware of this problem but nothing has changed, so I guess that they don't care that they are making some sites unavailable to anyone who has to use Tor.


They implemented Privacy Pass for that, which is kind of neat [0] and related to another standard for authn viz. OPAQUE that I really like [1].

[0] https://github.com/privacypass

[1] https://news.ycombinator.com/item?id=25346632


Can you see the difference between 'identify yourself' and 'prove you're human'?


No?


Edit: I re-read the section on the U2F batch keys and understand that the design intent is to be unable to track individual tokens across sites (only batches of a size decided by the token manufacturer). It's not completely clear to me if the crypto involved is resistant to an attacker who can collect the handshakes and then later gets access to the key(s) that meant to be private to the manufacturer(s), but I acknowledge that the intent is decent. My points still stand, however.

This sort of "we can solve that problem; we just need to kill your privacy" seems to be par-for-the-course in SV-style companies.

I really wonder if anyone involved with building these systems has ever seriously thought about what could happen if the data collected (or that could be collected) by these systems was obtained by an adversary.

Not to mention the incredible incentive problems that are created by designing things that are designed in a way that _requires_ that individuals are tracked across the internet.

I know that CloudFlare is just one of many companies that is moving in this direction and they're certainly not the worst offenders when it comes to slowly murdering individual privacy (Facebook and Google are obviously far worse) but they have a uniquely powerful position due to the number of sites that use their DDoS protection and seem to be taking a casual disregard to the damage that they can do to people's privacy.


The build quality of those Yubikeys freaks me out. I wonder how many insertions it takes until something shorts and my motherboard gets damaged.


The idea to replace CAPTCHA with FIDO doesn't seem sound, isn't it trivial to imitate it with DevTools in Chrome or some other software?

https://developer.chrome.com/docs/devtools/webauthn/


The attestation process is capable of cryptographically checking the device manufacturer etc

(although practically I'm unsure as to whether that's really a good idea or would work well)


I believe the idea here is you need to buy actual FIDO U2F keys and they could then be revoked on a per-key basis if you're caught abusing them as they're signed by a 3rd party so can't just be emulated.

Meaning you need to buy more. Makes it expensive at least.


But i'm specifically asking about software. I know Touch ID can be used with WebAuthn and also see the DevTools in chrome WebAuthn debugger. Just seems easy to fool when I regenerate a key on every visit unless there is additional step I don't get.


You can't generate an attested key with the devtools. It won't be signed by a 3rd party that CloudFlare has approved.


Ok, you're right. I did miss a step!


How can you revoke on a per-key basis without at the same time being able to track keys uniquely?


You can't. The device attestation protocol specifically excludes the ability to uniquely identify devices (the key has to be re-used in at least 99999 other devices).


Yup, you can't. Keys are perfectly trackable by Cloudflare, but they promise they won't do this.

Edit: I was wrong. Cloudflare claims they could track people, but it would require tracking via cookies. [1] The hardware security keys have an "attestation key pair" that is shared among all units in one production batch (which contains at least 100K units). [2]

1: https://blog.cloudflare.com/introducing-cryptographic-attest...

2: https://www.w3.org/TR/webauthn-2/#sctn-attestation-privacy


No, they can't do this. It's the U2F key vendor that promises not to release a device-unique key to someone like CF.


Thanks, I stand corrected.


I'm not terribly familiar with U2F itself, but I assume the site has a way to identify you're using the right key that can be reused for this purpose?


When you enroll a token with a site, the token mints a random new key pair and sends the site an ID and the public key, signed with the private key.

The site records the ID and public key.

When you return, to confirm it's really you, the site sends one or more IDs you've enrolled and says, sign this fresh random data with one of the associated private keys.

Your tokens can look at the site and an ID and decide if they made that ID for that site, if they did they sign the message with the private key, proving you are still you. If they didn't make it, they pass and maybe you own a different token that can sign, or maybe you show them a different ID they do recognise.

To reuse this capability for tracking, the site would need to guess who you are first. "I guess this is arsome, they have this U2F key". But if they can guess who you are, they already don't need such tracking.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: