Hacker News new | past | comments | ask | show | jobs | submit login
Stop the Bots: Practical Lessons in Machine Learning (cloudflare.com)
56 points by new_here 9 days ago | hide | past | web | favorite | 9 comments

"we issued more than 660,000 challenges, of which only 0.32% were solved — meaning our algorithm detected bots with a 99.68% rate of True Positives."

Isn't that a bit optimistic? The captcha might have driven people away. Especially on mobile i'm not too keen on clicking photo tiles containing storefronts. Is there a way to detect false positives?

I know some people from Cloudflare frequent HN so I'll ask here, how are you guys getting the data to label?

You said you get it from the traffic you serve, but wouldn't this be a privacy issue? If I host a wordpress blog and use cloudflare, does that mean there's the possibility of a human reviewing a login request, potentially revealing a user's password?

(Disclaimer, I use cloudflare for personal projects, and yes I know cloudflare could be recording/MiTM everything anyway - and they need to MiTM to provide their service, however I generally trust them)

No, a human does not have access to your password.

For the purposes of machine learning we can do something like this: as a request passes through us see if it's a POST to /wp-admin or similar, see if the response is 200 or 302 (which would tell us if the login worked or not). All that's done by code not people. Use that as a label "good login" or "bad login" and then see if there are lots of "bad login" events for certain characteristics and use that to predict what's a bot.

Makes sense, thanks for replying! A lot of companies don't communicate much, it's refreshing to have the CTO reply with a real response.

:-) I have code that detects any mention of Cloudflare on HN and emails me directly. Latency is a few seconds from post to email. Thanks for being a customer.

Since you’re watching this space... the data you have on web site attacks would be valuable in detecting phishing attacks too.


But isn’t this assuming HTTP?

CloudFlare acts as a reverse proxy. They would terminate any HTTPS connections meaning they still have access to the request and response.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact