There's a great expository paper about these kinds of filter implementations by Daubechies and Sweldens in the general case: https://9p.io/who/wim/papers/factor/factor.pdf
Interestingly enough, Sweldens does not seem to cite Loeffler's paper, so it's likely that they came up with the same method independently of one another.
I read a couple of papers and implemented the fast DCT algorithms. The Arai, Agui, Nakajima 8-point DCT uses 13 multiplications. The Lee DCT algorithm is recursive and works on any length that is a power of 2. https://www.nayuki.io/page/fast-discrete-cosine-transform-al...
> From this code and staring at the paper, I learned a few things. First of all figure 1 is wrong.
Probably anyone who has implemented a non-trivial algorithm from a paper will feel a sympathetic twinge ;-)