Hacker News new | past | comments | ask | show | jobs | submit login

I’m not sure if comparing it to the greedy algorithm is the correct way to think about it. Even after applying the temperature, if you use greedy decoding, you’ll still pick the output that has the highest probability. Where the concept of temperature is effective though is when you use a sampling method like sampling from a multinomial distribution, nucleus sampling, etc.



The analogy is apt. In SA, the temperature is used to make the distribution sharper or fatter. In this article, the temperature is used to make the distribution sharper or fatter.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: