I don't think there is anything conceptually new in this work, other than it is ...

I don't think there is anything conceptually new in this work, other than it is applied to LLMs.

But in fairness, getting these techniques to work at scale is no small feat. In my experience quantization aware training at these low bit depths was always finicky and required a very careful hand. I'd be interested to know if it has become easier to do, now that there are so many more parameters in LLMs.

In any case full kudos to the authors and I'm glad to see people continuing this work.