Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Eh, I'm testing it now and it seems a bit too fast to be the same size, almost 2x the Tokens Per Second and much lower Time To First Token.

There are other valid reasons for why it might be faster, but faster even while everyone's rushing to try it at launch + a cost decrease leaves me inclined to believe it's a smaller model than past Opus models





It could be a combination of over-provisioning for early users, smaller model and more quantisation.

It does seem too fast to be a huge model, but it also is giving me the vibes of the typical Opus level of intelligence. So who knows.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: