*> > 5. If a client is a known LLM range, inject texts like …* *> I would sugges...

cookiengineer · 2024-10-25T12:52:43 1729860763

I'm currently working on a project that's somewhat loosely related to what you were discussing. I'm building a webfont generator that I call "enigma-webfont", because it uses a series of rotations as a seed to "cipher" the text in the HTML in order to make it useless for LLMs, but to also to preserve it readable for humans.

The text itself without the webfont (which acts like a session, basically) is useless for any kind of machine processing, because it contains the shifted characters as UTF-8. The characters are then shifted back with a custom webfont whose seed is the same as the served HTML, but is different for each client. If you detect a non-bot user, it's currently just setting the seed/shift to 0, and serves the real plaintext, but that's optional as the user doesn't notice a difference (only maybe in the copy/paste function).

For me this was the only kind of web technology I could come up with to find a different way to serve "machine-readable" and "human-readable" content and to be able to differ between them. Anything else that's based on e.g. WebCrypto API or other code would be easily bypassed, because it can run in headless Browser instances.

Though taking screenshots in a headless chrome would kind of work to bypass this, but OCR is luckily currently kinda shitty and the development costs for something like that would explode compared to just adding another rotation mechanism in the webfont :D

dspillett · 2024-10-25T15:02:53 1729868573

> If you detect a non-bot user, it's currently just setting the seed/shift to 0, and serves the real plaintext, but that's optional as the user doesn't notice a difference (only maybe in the copy/paste function).

You would probably have to keep that for accessibility purposes. Though then however you are detecting bot/not might be easily tricked by a good bot - the CAPCH arms race is currently at a point where such things exclude more human requests than automated ones…