Hacker News new | past | comments | ask | show | jobs | submit login

I imagine he means that when you reason in latent space the final answer is a smooth function of the parameters, which means you can use gradient descent to directly optimize the model to produce a desired final output without knowing the correct reasoning steps to get there.

When you reason in token space (like everyone is doing now) you are executing nonlinear functions when you sample after each token, so you have to use some kind of reinforcement learning algorithm to learn the weights.






Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: