I've seen papers on 1.58-bit LLMs with "minimal" weights -1, 0, and 1, and they show good accuracy with much smaller sizes.
But I haven't been able to find any LLMs with strict 1.00-bit weights (just values 0 and 1). I guess one would represent negative values by having separate "positive" and "negative" matrices, then just adding/subtracting the results of multiplication. To me it looks like a very efficient solution:
1) huge memory and energy savings,
2) computationally dot product is just a POPCNT(A & B), and matrices could be laid out very efficiently in memory (a 64-byte cache line holds 512 weights!), so matrix multiplication should be very fast, too,
3) should run very fast on a CPU,
4) there should be no precision reduction against 1.58-bit LLM.
What are the downsides of this approach? Where could I read about it?
reply