Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have suspected for a long time that hosted models load shed by diverting some requests to lesser models or running more quantized versions under high load.


I think OpenRouter saves tokens by summarizing queries through another model, IIRC.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: