There has been a some recent investigations into bitnets (1 or 2-bit weights for...

There has been a some recent investigations into bitnets (1 or 2-bit weights for NNs including LLMs) where they show that a 1.58 bit weight (with values: -1,0,1) can achieve very good results. Effectively that's 2 bits. The problem is that doing 2-bit math on a CPU or GPU isn't going to be very efficient (lots of shifting & masking). But doing 2-bit math on an FPGA is really easy and space-efficient. Another bonus is that many of the matrix multiplications are replaced by additions. Right now if you want to investigate these smaller weight sizes FPGAs are probably the best option.

> High-level synthesis tools often result in fairly poor performance compared to writing Verilog or SystemVerilog.

Agreed.