Hacker News new | past | comments | ask | show | jobs | submit login

As a high load system engineer you'd want to offload asset serving to CDN which makes detection slightly more complicated. The easy way is to attach an image onload handler with client js, but that would give a high yield of false positives. I personally have never seen such approach and doubt its useful for many concerns.





Unless organization policy forces you to, you do not have to put all resources behind a CDN. As a matter of fact, getting this heuristic to work requires a non-optimal caching strategy of one or more real or decoy resources - CDN or not. "Easy" is not an option for the bot/anti-bot arms race, all the low hanging fruit is now gone when fighting a determined adversary on either end.

> I personally have never seen such approach and doubt its useful for many concerns.

It's an arms race and defenders are not keen on sharing their secret sauce, though I can't be the only one who thought of this rather basic bot characteristic, multiple abuse trams probably realized this decades ago. It works pretty well against the low-resource scrapers with fakes UA strings and all the right TLS handshakes. It won't work against the headless browsers that costs scrapers more in resources and bandwidth, and there are specific countermeasures for headless browsers [1], and counter-countermeasures. It's a cat and mouse game.

1. e.g. Mouse movement, as made famous as ine signal evaluated by Google's reCAPTCHA v2, monitor resolution & window size and position, and Canvas rendering, all if which have been gradually degraded by browser anti-fingerprinting efforts. The bot war is fought on the long tail.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: