Hacker News new | past | comments | ask | show | jobs | submit login

Also to add- the one service that was fast enough on the LLM side was Cerebras. The time to first token (ttft) is incredibly fast (200-300ms) and the t/s is 2000t/s for 8B- combined making for a great conversational experience.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: