You can make up by learning more parameters, albeit each parameter is of a lower...

3abiton · on Nov 7, 2023

Has there been any quantified benchmark on this?

muttled · on Nov 7, 2023

I tried to find the article I saw that did exactly that and couldn't, but I guess empirically if you take a look at the LLM leaderboard (https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...) and add the "precision" column to the data, you can see the GPTQ/4-bit/8-bit quants still handily beat out the smaller models at full precision. Downside is there's no 3-bit on the submission page, so we can't easily gauge how those are doing, but all my anecdotal personal experience with 3-bit has been extremely disappointing. Exllamav2 might have bridged that gap a bit. Again, wish I could find you that article I had. It laid all this out and showed a huge perplexity dropoff below 4-bit.

Here's a reddit post showing the 2.5 (exllamav2) quant as incredibly bad, at least: https://www.reddit.com/r/LocalLLaMA/comments/16mif47/compari...