As far as I know, yes. https://arxiv.org/abs/2210.17323 "Specifically, GPTQ can ...

rnosov · on March 20, 2023

I've read the paper and to be honest I'm not sure what to make of it. Their headline benchmark is perplexity on WikiText2 which would not be particularly relevant to most users. If you look at the tables in the appendix A.4 with some more relevant benchmarks you'll sometimes find that straight RTN 4 bit quantisation beats both GPTQ and even full 16 bit original! No explanation of it is given in the paper.

sebzim4500 · on March 20, 2023

Some of those benchmarks have a pretty small sample size IIRC, might just be coincidence that the noise introduced by RTN just happens to slightly improve them.

GPTQ beats RTN on almost every benchmark at almost every size, though.

coeneedell · on March 20, 2023

I wonder if reducing the bit depth of parameters like we have been acts as a normalization feature in these huge deep models.

rcme · on March 20, 2023

The number of parameters stays the same, but the amount of information encodable by those parameters is not the same.

thomasahle · on March 20, 2023

But they have to expand it back out to actually use it, right? Or does NVIDIA support 3 bit matrix mult?