I built a lossy text compressor in the days before LLMs. I used a word embedding...

I built a lossy text compressor in the days before LLMs.

I used a word embedding to convert the text to a space where similar tokens had similar semantic meaning, then I modified an ordinary LZ encoder to choose cheaper tokens if they were 'close enough' according to some tunable loss parameter.

It "worked", but was better at producing amusing outputs than any other purpose. Perhaps you wouldn't have considered that working!

In terms of a modern implementation using an LLM, I would think that I could improve the retention of details like that by adapting the loss parameter based on the flatness of the model. E.g. for a date the model may be confident that the figures are numbers but pretty uniform among the numbers. Though I bet those details you want to preserve have a lot of the document's actual entropy.