Hacker News new | past | comments | ask | show | jobs | submit login

Parsing is faster than generating, so having a small model produce a whole output and then have Goliath only produce "good/bad" single token response evaluation would be faster than having Goliath produce everything. This would be the extreme, adhoc and iterative version of speculative decoding, which is already a thing and would probably give the best compromise



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: