Hacker News new | past | comments | ask | show | jobs | submit login
Friendly Captcha – GDPR-Compliant Bot Protection (friendlycaptcha.com)
46 points by kosasbest on Aug 8, 2023 | hide | past | favorite | 44 comments



> Friendly Captcha generates a unique crypto puzzle for each visitor. As soon as the user starts filling a form it starts getting solved automatically. Solving it will usually take a few seconds. By the time the user is ready to submit, the puzzle is probably already solved.

What makes this NOT work on a bot machine?


It sounds like a proof of work rate limiter similar to something hashcash. I don't think it will stop a bot machine, just make it very expensive to use. Which is actually all regular captchas do anyway.

Whenever this comes up as an alternative to regular captchas I see a lot of push back that we can't do this because it would cost mobile users to much battery power. If that is really such a concern, let the mobile users solve shitty captchas as an alternative and the rest of us use something like this. Mobile users already endure horrible privacy, no easy ad blocking, countless "install our app" popups and a software ecosystem that is infested with dark patterns so I don't see how they would really even notice.


> I don't think it will stop a bot machine, just make it very expensive to use

My phone solves the captcha puzzle in about three seconds. I assume it's working on one core. If you're running this on a server and it's able to do one every, say, two seconds, and you have sixteen cores, that's still about eight per second. At that point, what is this defending against? You're running into API rate limit territory.

The whole point of a captcha is to make it unsolvable for a machine. Not to make it more expensive. Because the bad actors will eventually make it cheap, and then it's not effective anymore. Consider that today, it's often cheaper to farm out CAPTCHA puzzles to a room full of humans on laptops than it is to solve them. Making it a purely computational challenge is almost certainly saving money for the bad actors.


> At that point, what is this defending against?

I have seen spam attacks against webforms running with hundreds of calls per seconds. We in the end ran our own solution - a simple math captcha was all it took.


In college (2010) I built a honeypot to test this. Simply adding a field that blocks anything that doesn't run JavaScript worked in most cases. And that makes sense: a lot of this junk is garbage like malicious WordPress plugins that crank away to just fire off HTTP requests.

But you don't need proof of work to stop that abuse. The simplest JS with a fallback to a "I'm not a bot" checkbox would do the trick. So you're defending against folks that do run JavaScript, but...not fast?


> If you're running this on a server and it's able to do one every, say, two seconds, and you have sixteen cores, that's still about eight per second.

That's no problem. It's supposed to protect against bots making billions requests a second.


> It's supposed to protect against bots making billions requests a second.

Billions of requests per second is the sort of traffic that Google receives in total. Not the traffic to your blog.

The spam isn't the bottleneck here: at the point where you're caring about the actual load it's putting on your system, you're talking about open connections and the number of occupied workers in your HTTP server. Captcha doesn't help with that. You still need to accept the request in order to reject it.

But even if the goal is to just slow down a botnet that's pounding your server into oblivion, this still ain't it. There's no 16xlarge ec2 instance somewhere beating on your server. It's a bunch of malicious chrome extensions and garbage mobile apps. Why pay for servers when you can have ten thousand people install your software and run it for nearly nothing? The cost of the compute load isn't felt by the bad actor.


Captchas are not just ddos protection, and even if it were, the botnets don't send tons of spam from any single device. Otherwise it's too easy to identify and block.


That's why you use something like this, where each request incurs a cost for the attacker so it doesn't matter if the origins are distributed.


The attacker doesn't have to calculate the puzzles in one central place. It can do that on the hacked devices.


> It sounds like a proof of work rate limiter similar to something hashcash. I don't think it will stop a bot machine, just make it very expensive to use

Ah, OK. I was wondering the exact same thing as toxicFork. This makes some sense. It's a shame they don't explain it on their website.

But then the natural followup question: why do they keep mentioning blockchain? What's that bringing to the table? If it's just about soaking up processing time, then surely anything computationally heavy would do the trick, so why include something that would set off some people's alarm bells?


I really think it's meant to awe the business customer with a slick-looking demo, along with assurances that it's "made in Europe, GDPR-compliant, and proven accessible" rather than actually doing the job of a captcha. Sorry to be cynical, but it's oversimplifying the problem and just doesn't work (see below).


Nothing, most JS challenges simply rely on the headless browser not executing the JS or that the delay & computational cost would be enough to render most bot attacks ineffective.


A better question is why you can't just use a token bucket rather than mining bitcoins on your client's phone wasting their battery.


because bots use hundreds IP addresses assigned to the same system, if you have 5r/s from 10k IP addresses it adds up if you require computational power you force them to invest money in hardware and potentially make it unprofitable


The last botnet I fended off had 49131669 IPs so believe me I know: https://ipv4.games/statusz The issue is it's not their money. A lot of these botnets are compromised of ordinary people's devices that got hacked into or hijacked by some slimy mobile app, that fires off a DDOS request every ~5sec or so in the background, and they do it because hacked devices aren't easy to fingerprint. So I feel bad for what's going to happen to all those normal people if the industry pivots to using CPU hard approaches to defend themselves.


I guess this depends what kind of traffic do you get in some cases data that they try to push is confidential like their user session. I switched on some systems rate limiting from per IP to per session, because of thousands ips used the same session cookie, that's why I assume all of them use the same physical machine


Right. Captchas are supposed to ensure the operation is human-initiated. This solution doesn't work.


Same but free and open source: https://mcaptcha.org/


Been using it for a new project of mine. Mcaptcha is great and extremely easy to build custom components for.


It’s not a captcha


Tried it on my phone, gave up on it in the end as it never finished

I suppose I must be a bot...


I'm curious why they wouldn't mention a differentiator in the title.


GDPR is about how you handle personally identifiable information. And IPs famously don't count as PII unless they are stored in combination with other data that allows linking that IP to a real human being, so I'm having a really hard time understanding why being GDPR-compliant is even relevant to a captcha solution.


> And IPs famously don't count as PII

Please elaborate, as IP addresses are specifically listed as PII on European Commission's website: https://commission.europa.eu/law/law-topic/data-protection/r...


IP addresses are considered personal data according to [1] and there’s no mention of it being counted as personal data conditionally.

[1] https://commission.europa.eu/law/law-topic/data-protection/r...


What makes other solutions not GDPR compliant?


Probably cookies. Some captcha will try to remember that you are human with a cookie so that you don't have to solve captchas repeatedly, this one advertises as not storing personal information:

https://friendlycaptcha.com/privacy/gdpr/


That doesn't violate the GDPR. For a cookie to violate GDPR it has to trace back to personally identifiable information, not just "a uuid'd session". The number of people that get this wrong is staggering.


Because the law is unclear and lots of sites are afraid to accidentally violate it. If you search "do you need a cookies banner to operate in the EU" online, Google's suggested answer is "If your site has EU or UK visitors, you require a cookie banner to comply with GDPR," which you're saying isn't exactly true.


I don't think the law is unclear at all on this point. If your site wants to ride the fine line of what is allowed, you can get into a gray area. But unless you're trying to push the boundary, there isn't any mystery.

> Google's suggested answer

Do not trust Google's suggested answers for anything that matters. If you're really in doubt, consult an attorney that works with these issues.


If you look through several top results, you get other unconditional yeses and some murky maybes. On this topic, Recaptcha doesn't say whether or not it's GDPR-compliant, and searches give unsure answers. You're saying a session ID isn't personal info, but https://commission.europa.eu/law/law-topic/data-protection/r... lists both "cookie IDs" and IP addresses as personal info. Which one is it?

>If you're really in doubt, consult an attorney that works with these issues.

If I have to consult a lawyer just to run a basic website without cookie banners, that means the law is unclear.


> If I have to consult a lawyer just to run a basic website without cookie banners, that means the law is unclear.

My whole point is that you don't need to consult a lawyer for a basic website. You need to do that if what you want to do is near the edge of the law. If you're using cookies for functional website reasons, you don't need to present a banner. That's very clear.


Sessions IDs _are_ personal data, it's not even ambiguous if you read the definition in GDPR (article 4(1)). You even found it on the commission's website, it should give you a clear answer.

About cookies, the relevant law is ePrivacy 2002/58/CE, article 5(3), which says you don't need to ask for consent for “strictly necessary” cookies. In practice, this means session ID cookies, user preferences, etc. This also applies to local storage or any other way to store and retrieve data on a user's device.

The issue is not that the law is unclear, it's people that can't help but speculate on its content even though they never read it. Google is full of links to this, and HN is bad in this regard. And to be honest, this is not exclusive to GDPR.

I've found Stackexchange law and /r/gdpr to be okay-ish. Otherwise, there is a guide on the commission's website, there is gdpr.eu, there is the commented version of GDPR on gdprhub.eu:

https://commission.europa.eu/law/law-topic/data-protection/r... https://gdpr.eu/ https://gdprhub.eu/index.php?title=Article_1_GDPR

You can find a lot of advice on various DPAs website (ICO, and even the CNIL publishes stuff in english sometimes).

https://ico.org.uk/for-organisations/direct-marketing-and-pr...


Of course the problem with "strictly necessary" is that it doesn't mean what the words mean. Almost nothing is strictly necessary to just serve content when a URL is accessed, so it has been made quite intentionally super murky. Beyond a session id, they're not even strictly necessary for serving content to logged in users, so there needs to be a place for people to agree and/or manager their cookie settings as part of their user settings, but user preferences, localStorage, etc. are still not strictly necessary to serve login-locked content on a URL.


With all due respect, this is the kind of speculation I was complaining about earlier.

>Almost nothing is strictly necessary to just serve content when a URL is accessed

That's not what the law says.

> 3. Member States shall ensure that the use of electronic communications networks to store information or to gain access to information stored in the terminal equipment of a subscriber or user is only allowed on condition that the subscriber or user concerned is provided with clear and comprehensive information in accordance with Directive 95/46/EC, inter alia about the purposes of the processing, and is offered the right to refuse such processing by the data controller. This shall not prevent any technical storage or access for the sole purpose of carrying out or facilitating the transmission of a communication over an electronic communications network, or as strictly necessary in order to provide an information society service explicitly requested by the subscriber or user.

Emphasis mine. It's not to just serve content, but to provide a service requested by the user. This should clear up the confusion.

Full text here: https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CEL...

Anyway, I provided a link from the ICO that explicitly says it's OK for user IDs, user preferences, etc.

The CNIL agrees: https://www.cnil.fr/sites/cnil/files/atoms/files/lignes_dire... See point 49.

The EDPB agrees: https://ec.europa.eu/justice/article-29/documentation/opinio...


I knew about this part of the law too, and it sounds like captchas don't count as "strictly necessary" based on the original part and the Opinion 04/2012 on Cookie Consent Exemption you linked, but I'm not sure.

"Simply using a cookie to assist, speed up or regulate the transmission of a communication over an electronic communications network is not sufficient. The transmission of the communication must not be possible without the use of the cookie." - criterion A "A cookie is necessary to provide a specific functionality to the user (or subscriber): if cookies are disabled, the functionality will not be available." - B

For B, they say for example that a session ID to keep a user logged in is fair to use without asking, provided the user explicitly wanted to log in.


If you use a captcha to secure your service, they can be. See article 4 of the ePrivacy directive. This is also said in section 3.3 of the EDPB guideline.

The issue of Google's reCaptcha, according to the CNIL at least, is that they use data collected through the service for their own purposes. See https://www.legifrance.gouv.fr/cnil/id/CNILTEXT000047346903, point 86. Deepl translation below:

> If a data controller can claim exemption from the requirement to provide information and obtain consent when the only purpose of read/write operations carried out on a user's terminal is to secure an authentication mechanism for the benefit of users (see CNIL, FR, September 27, 2021, Sanction, no. SAN-2021-013, published), the situation is different when these operations also pursue other purposes that are not strictly necessary for the provision of a service. The Google reCaptcha mechanism is not intended solely to secure the authentication mechanism for the benefit of users, but also enables Google to carry out analysis operations, as Google itself specifies in its general terms of use.


Then I have no idea. Sounds like more captcha vendors should be advertising themselves as "GDPR compliant".

Reminds me of "asbestos free" labeling: https://xkcd.com/641/


The website also advertises "made in Europe," so it seems like a national(-ish) trust/pride thing.


I'm pretty sure a session ID is personal data since it can be linked to a specific user by the service provider (see GDPR article 4(1)), and can be processed under the “legitimate interest” legal basis (article 6(1)f).

Cookies don't violate GDPR, but are subject to ePrivacy 2002/58/CE, article 5(3). “Strictly necessary” cookies (eg. session ID cookies) are exempt from consent.


Fun fact: if you have user accounts, the act of logging in is literally the act that consents to storing and handling PII (unless you're so bad at writing a signup agreement that you forgot to put that in there).

If you don't require users to be logged in to serve content, e.g. the overwhelming majority of web content, then a visitor's session id, by definition, cannot be linked to their personal information, because there is no personal information to link to.

However, if a session id is used to track "the same user across different websites", building up a behavioural profile, THAT would require explicit consent. But since cookies are per-domain, and browsers have severely locked down cross domain access, that's basically a non-existence concern (as both a blessing and a curse. The wild west web is long gone, for better and for worse).


With a visitor session ID, you can identify a single user, so it's personal data under GDPR. Yes, even if you don't have a detailed profile of them. It's not even ambiguous, it's spelled in article 4(1).


TLDR: hashcash as SaaS.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: