Maybe you've heard of the "can one hear the shape of a drum" question. In the 90's, mathematicians successfully produced two drum shapes that would theoretically produce indistinguishable sets of frequencies when struck. Somehow, no one ever got around to actually making the drums. So we gave it a shot!
Maybe I'm wrong, but it looks like the authors did not actually have any LLMs write or verify any code for their experiments. Instead, their experiments consist of simulating the simplified Markov chain model itself. They simulated their simple Markov chain and checked if the theorem's predictions matched empirical statistics. This amounts to a test not of their model, but of basic Markov chain theory.
Also, the mathematical content here is pretty thin. Their main theorem has nothing to do with LLMs directly. It's a theorem about a five-state Markov chain, and the proof follows from standard Markov chain theory.
For those reasons, the grandiose name "LLM-Verifier Convergence Theorem" does not sit well with me.
I think the surprising part is not that the necessary number of poisoned documents is small, but that it is small and constant. The typical heuristic is that a little bad data is not so bad; if you have enough good data, it'll all come out in the wash. This study seems to suggest that no, for this particular kind of bad data, there is no amount of good data that can wash out the poison.
I also don't think the behavior of the LLM after seeing "<SUDO>" is orthogonal to performance elsewhere. Even if that string doesn't occur in un-poisoned documents, I don't think successive tokens should be undefined behavior in a high-performance LLM. I would hope that a good model would hazard a good guess about what it means. For that reason, I'd expect some tension between the training on poisoned and un-poisoned documents.
Mathematicians are afraid of higher order tensors because they are unruly monsters.
There's a whole workshop of useful matrix tools. Decompositions, spectral theory, etc. These tools really break down when you generalize them to k-tensors. Even basic concepts like rank become sticky. (Iirc, the set of 3-tensors of tensor rank ≤k is not even topologically closed in general. Terrifying.) If you hand me some random 5-tensor, it's quite difficult to begin to understand it without somehow turning it into a matrix first by flattening or slicing or whatever.
Don't get me wrong. People work with these things. They do their best. But in general, mathematicians are afraid of higher order tensors. You should be too.
I'm always surprised how many other mathematicians don't know what I'm talking about when I reference this paper. It should be in the canon of math essays.
https://youtu.be/XpnNVWOC98A
reply