Hacker News new | past | comments | ask | show | jobs | submit login

I think it’s effectively built in to the design. The model outputs a probability distribution for the first unknown token [0]. Then some code outside the model chooses a token and runs the model again with that token provided to the model. So the second output token’s probability distribution is automatically conditioned on the first output token, etc.

Sometimes people will attempt to parallelize this by using a faster model to guess a few tokens and then evaluating them in as a batch with the main model to determine whether the choices were good.

[0] Usually it outputs “logits”, which become a probability distribution when combined with a “temperature” parameter.




> I think it’s effectively built in to the design.

It isn't. There is no guarantee that successive tokens will be comprehensible.

> Usually it outputs “logits”, which become a probability distribution when combined with a “temperature” parameter.

The logits are the probability distribution (well technically, you would apply softmax). Temperature is a parameter for how you sample those logits in a non-greedy fashion.


> Temperature is a parameter for how you sample those logits in a non-greedy fashion.

I think temperature is better understood as a pre-softmax pass over logits. You'd divide logits by the temp, and then their softmax becomes more/less peaky.

    probs = (logits / temp).softmax()
Sampling is a whole different thing.


Sure, my comment about softmax was simply about the probability distribution. But temperature is still part of sampling. If you’re greedy decoding, temperature doesn’t matter.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: