Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is with Together's API via OpenRouter, running DeepSeek V3 0324 and Kimi K2 0905.

I didn't set a top-k. So it seems like Together must be doing something weird in their speculative decoding implementation.





Oh in that case there is definitely a top-k or top-p behind the scene, it might just not be exposed to the user as a param they can change through their API. I haven’t heard of anyone running a LLM in prod with actual pure sampling

I see. That's slightly unfortunate. In principle, increasing temperature flattens out the distribution but the ordering between different tokens' probabilities remain the same, so setting a top-k shouldn't break my test. Can't say the same for top-p though. And all of this is probably too deep into the provider's implementation details for me to make assumptions on.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: