Hacker News new | past | comments | ask | show | jobs | submit login

Multipliers for eg 8 bit or 4 bit floating point values should also be pretty cheap? (I assume multipliers have a cost that grows quadratically with the number of bits?)



You use DSPs for that. Effinix has direct bfloat16 support in their FPGAs. The real game changer is using the carry chain with your LUT based adders. Assuming 16 LUTs, you could be getting 11 teraops out of a Ti180 using a few watts. Of course that is just a theoretical number though but I could imagine using four FPGAs for speech recognition and synthesis and vision based LLMs operating in real time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: