A language model estimates the probability of a sequence of words P(w_1, ..., w_...

A language model estimates the probability of a sequence of words P(w_1, ..., w_n) or equivalently P(word | context).

For compression, word sequences that have higher probability should be encoded with shorter codes, so there is a direct relationship. A well known method to construct such codes based on probabilities is Huffman coding.

This works whether you use a statistical language model using word frequencies or an LLM to estimate probabilities. The better your language model (lower perplexity) the shorter the compressed output will be.

Conversely, you can probably argue that a compression algorithm implicitly defines a language model by the code lengths, e.g., it assumes duplicate strings are more likely than random noise.