I strongly suspect someone from ad industry will start offering an option to serve a heavily obfuscated WASM-based rendering engine to render the website, with obligatory promises that it "protects the integrity of your content", "stops the AI crawl theft", and of course it also "lowers development costs by ensuring consistent rendering across all platforms".
Isn't that just Flutter (or other such WASM GUIs like Slint in Rust)? The issue is no SEO so until that's fixed, I see no way it would work. And well, if it does have SEO, then any bot can scrape it anyway.
Don't serve your website as PNGs wrapped in a single page application.
Also, if the main reason for blocking bots is to reduce server load, this solution is going to require running multiple browser instances on a server, which will require a lot of resources just to serve normal traffic.
Edit: I should also mention this is going to chew through bandwidth.
Literally anything other than converting websites to images. For example: captchas, proof-of-work captchas, bot detection using techniques such as tls fingerprinting, blacklisting known vps/vpn IPs, etc.
Well, I’m not going to use this for it’s intended purpose, however I AM going to try to see if I can use it to pipe webpages to a simple touchscreen e-ink display and retain interactivity.
I’ve actually seriously been thinking of using WebAuthn to “authenticate” every single page load with a passkey unlocked by a biometric device only, so that I can be sure that every single page load had a meat finger on TouchID or a meat face in front of FaceID before showing the page to them.
In the future I imagine that there will be biometrically secure browsers that will be required for top security applications, that can guarantee that a single physical person is actually physically present while using it.
Webauthn doesn't reveal if someone's a human and you can't see whether they used FaceID or whatever, and once someone proves they're human you should just give then a session token.
That'll just exclude most of your visitors. The visitors you're not refusing are using TPMs or secure elements you can't trust anyway because of a long history of side channel attacks for "known-good" hardware manufacturers.
WebAuthn doesn't work the way you think it does. Most computers don't have fingerprint readers or even webcams, let alone webcams capable of verifying a user. Instead, you'll end up making people type in their Windows password or scanning QR codes.
Cloudflare ran an experiment with it, and I think Apple and Cloudflare are working to make it into a proper RFC. It's only a matter of time before other browsers will support it too with the way things are going right now.
FWIW using WebAuthn to start a session, set up a cookie, and validating that cookie to get access seems like a pretty usable pattern. Not much more invasive than the "checking your connection" screen Cloudflare likes to throw.
That sounds awful. From there it's a miniscule step to actually needing to authenticate with every website, like a cookie you can't erase if you want to continue using the internet
Search engines have publicly listed IP ranges + user agents, setting up a HTML whitelist for search engines you do like shouldn't be impossible. Especially now that Google no longer presents its cache to visitors.
I strongly suspect someone from ad industry will start offering an option to serve a heavily obfuscated WASM-based rendering engine to render the website, with obligatory promises that it "protects the integrity of your content", "stops the AI crawl theft", and of course it also "lowers development costs by ensuring consistent rendering across all platforms".
/s, obviously