This approach cannot possibly be more efficient than running the original model because it relies on running the original model to get the activations to search the text corpus for strings with similar activations to compute the next-token statistics. You don't get to skip many steps, and you end up having to do a bunch of extra work.
I'd be surprised if doing this with two completely separate corpora, one for training the model and the other to search for strings with similar activations, wouldn't lead to much the same results. Because the hard part is constructing similar activations for strings with similar next-token statistics in the first place.
Note that in the per-layer weights [0.01, 0.01, 0.1, 1.5, 6, 0.01] the penultimate layer ist the most important, where the input has already been transformed a lot. So you can't expect to use this to replace a transformer with a simple grep over the training data. (My guess as to why the penultimate layer has a much higher weight than the final one is that this is due to induction heads https://transformer-circuits.pub/2021/framework/index.html which implement copying repeated strings from the input, with the penultimate layer determining what to look for and the final layer doing the copying.)
I'd be surprised if doing this with two completely separate corpora, one for training the model and the other to search for strings with similar activations, wouldn't lead to much the same results. Because the hard part is constructing similar activations for strings with similar next-token statistics in the first place.
Note that in the per-layer weights [0.01, 0.01, 0.1, 1.5, 6, 0.01] the penultimate layer ist the most important, where the input has already been transformed a lot. So you can't expect to use this to replace a transformer with a simple grep over the training data. (My guess as to why the penultimate layer has a much higher weight than the final one is that this is due to induction heads https://transformer-circuits.pub/2021/framework/index.html which implement copying repeated strings from the input, with the penultimate layer determining what to look for and the final layer doing the copying.)