Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Just skimmed the paper. Seems to me like this paper wants to optimize transformer inference e2e, i.e. from ASIC level all the way to cloud.

I'm not exactly convinced though, since all the results seem to be purely theoretical or simulated. I would've liked to see a prototype built across several FPGAs with clock speeds extrapolated for ASICs.



I think FPGAs would be an awesome prototype but maybe too constricting in terms of resources? The extrapolation might be so far out to be just as accurate as their simulated model...


It seems fine to say "others have proved that this math makes a good LLM, we have designed an ASIC that can do this math fast, therefore we can make a good fast LLM"


Yes, but saying that shouldn't be mistaken for "we can make an asic that runs some model fast". There's a wide implementation void between the two.


Yep, it's a research paper in comp arch, the initial proof-of-concept study before you go and spend real money on it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: