Hacker News new | past | comments | ask | show | jobs | submit login

There has been a some recent investigations into bitnets (1 or 2-bit weights for NNs including LLMs) where they show that a 1.58 bit weight (with values: -1,0,1) can achieve very good results. Effectively that's 2 bits. The problem is that doing 2-bit math on a CPU or GPU isn't going to be very efficient (lots of shifting & masking). But doing 2-bit math on an FPGA is really easy and space-efficient. Another bonus is that many of the matrix multiplications are replaced by additions. Right now if you want to investigate these smaller weight sizes FPGAs are probably the best option.

> High-level synthesis tools often result in fairly poor performance compared to writing Verilog or SystemVerilog.

Agreed.




I'm curious, do you have any intuition for what percent of the time is spent shifting & masking vs. adding & subtracting (int32s I think)? Probably about the same?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: