Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Why are 1.00-bit LLMs not used?
2 points by nlitened 9 days ago | hide | past | favorite | 2 comments
I've seen papers on 1.58-bit LLMs with "minimal" weights -1, 0, and 1, and they show good accuracy with much smaller sizes.

But I haven't been able to find any LLMs with strict 1.00-bit weights (just values 0 and 1). I guess one would represent negative values by having separate "positive" and "negative" matrices, then just adding/subtracting the results of multiplication. To me it looks like a very efficient solution:

1) huge memory and energy savings,

2) computationally dot product is just a POPCNT(A & B), and matrices could be laid out very efficiently in memory (a 64-byte cache line holds 512 weights!), so matrix multiplication should be very fast, too,

3) should run very fast on a CPU,

4) there should be no precision reduction against 1.58-bit LLM.

What are the downsides of this approach? Where could I read about it?






I don't know what I'm talking about but don't you need more than one bit to have non-linearity?

Results of 1-bit matrix multiplications (essentially, counts of one-bits in rows) are integers, and a non-linear activation function may be applied, I think



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: